STATISTIC
S
STATISTICS
Statistics is the study of the collection,
analysis, interpretation, presentation, and
organization of data. In other words, it is a
mathematical discipline to collect, summarize
data.
APPLICATIONS OF
STATISTICS
Statistics in Statistics in Statistics in Statistics in Statistics in social
Economics industry insurance astronomy sciences
Statistics in Statistics in
Statistics in
Biology and Psychology and Statistics in war
Physical Science
Medical Science Education
LIMITATIONS OF STATISTICS
It is not concerned
It does not It does not reveal
with the
recognize the the entire story of
qualitative
individual items: a phenomenon:
phenomena:
Its results are true
It laws are not It is likely to be
only on an
exact: misused:
average:
STATISTIC VS. PARAMETER
A statistic is a characteristic of a sample.
• It is a numerical or graphic way to summarize data obtained from a
sample
A parameter is a characteristic of a population.
• It is a numerical or graphic way to summarize data obtained from
the population
TYPES OF NUMERICAL DATA
There are two fundamental types of numerical data:
Categorical data: obtained by determining the frequency of occurrences in each of several
categories
Quantitative data: obtained by determining placement on a scale that indicates amount or degree
TYPES OF ENQUIRY
OFFICIAL, SEMI- INITIAL OR CONFIDENTIAL DIRECT OR REGULAR OR CENSUS OR
OFFICIAL OR UN- REPETITIVE OR NON- INDIRECT AD-HOC SAMPLE
OFFICIAL CONFIDENTIAL
PRIMARY OR
SECONDARY
CLASSIFICATI
ON OF DATA
Classification is the process of arranging data into
sequences and groups according to their common
characteristics or separating them into different but
related parts.
FUNCTIONS OF
CLASSIFICATION
It condenses the data
It facilities comparisons
It helps to study the relationships
It facilitates the statistical treatment of the data:
TECHNIQUES FOR SUMMARIZING
QUANTITATIVE DATA
Frequency Histograms Stem and Leaf Distribution Averages Variability
Distributions Plots curves
DISCRETE VS CONTINUOUS FREQUENCY
DISTRIBUTIONS
Continuous Frq. Dis.
Raw Data Arranged Data Discrete Frq.Dis. Exclusive Classes
Marks Marks Marks Freqency
Marks Frequency
76 32 32 1
0 - 25 0
93 39 39 2
25 - 50 3
39 39 50 2
50 - 75 3
50 50 66 1
75 – 100 5
76 50 76 2
81 66 81 1 Inclusive Classes
66 76 90 1 Marks Frequency
50 76 93 1 0 - 25 0
39 81 26 - 50 5
90 90 51 - 75 1
32 93 76 – 100 5
Histogram, Stem and Leaf Diagram
Histogram Stem and Leaf Plot
Stem Leaf
6 78
9 018
SUMMARY MEASURES
Summary Measures
Central Tendency Quartile Variation
Arithmetic Median Mode
Mean
Range Coefficient of
Variation
Variance
Geometric Mean
Standard Deviation
MEASURES OF CENTRAL
TENDENCY
Central Tendency
Average (Mean) Median Mode
n
X i
X i 1
n
N
X i
i 1
N
MEAN (ARITHMETIC MEAN)
Mean (arithmetic mean) of data values
Sample mean
Sample Size
n
X i
X1 X 2 X n
X i 1
Population mean
n n
Population Size
N
X i
X1 X 2 X N
i 1
N N
The most common measure of central tendency
Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
MEAN
WEIGHTED MEAN
A form of mean obtained from groups of data in
which the different sizes of the groups are
accounted for or weighted. f ( x)
xw
N total
MEDIAN
Robust measure of central tendency
Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
In an Ordered array, median is the “middle” number
Median = 5
If n or N is odd, median is the middle number
If n or N is even, median is the average of the two middle numbers
MODE
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may be no mode
There may be several modes
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode
Mode = 9
QUARTILES
Split Ordered Data into 4 Quarters
25% 25% 25% 25%
Q
1 Q2 Q3
i n 1
Position of i-th Quartile
Qi
4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
19 1 12 13
Position of Q1 2.5 Q1 12.5
4 2
DIFFERENCES IN MEASURES
OF CENTRAL TENDENCY
Mode, median and mean could be three different numbers in asymmetrical distributions of
data.
For any data set there is only one mean and median but there may be many modes.
Median is less influenced by the extreme values than mean.
Mean is almost never observed, median is observed in only odd numbered data sets and mode
is always observed in the data set.
MEASURES OF VARIABILITY
Measures of variability show how spread out the distribution of scores is from the mean,
or how much dispersion or scatter exists in the distribution. If there is a large degree of
dispersion, that is, if the scores are very dissimilar, we say the distribution has a large or high
variability, or variance. If the scores are very similar, there is a small degree of dispersion and
a small variance.
MEASURES OF VARIATION
Variation
Variance Standard Deviation Coefficient
of Variation
Range
Population Population Standard
Variance (σ2) deviation (σ)
Inter-
quartile Sample
Range Variance (S2)
Sample Standard
deviation (S)
RANGE
The range is simply the numerical
difference between the highest and lowest
scores in the distribution.
INTERQUARTILE RANGE
Can eliminate some outlier problems by using the interquartile range
Eliminate high- and low-valued observations and calculate the range of
the middle 50% of the data
Interquartile range = 3rd quartile – 1st quartile
• IQR = Q3 – Q1
STANDARD
DEVIATION
The measure of variability used most often in
research is the standard deviation, a statistic
that indicates the average distance of the scores
from the mean of the distribution.
COMPARING STANDARD
DEVIATIONS
Data A Mean = 15.5
S = 3.338
11 12 13 14 15 16 17 18 19 20 21
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.57
STANDARD DEVIATION
WITH A MEAN OF 62 AND A SD OF
3, 95% OF SCORES SHOULD FALL
BETWEEN 62-2*3 AND 62+2*3. I.E
56 AND 68
VARIANCE
The Variance, s2, represents the amount of variability of the data
relative to their mean
As shown below, the variance is the “average” of the squared
deviations of the observations about their mean
s 2
( x x)
i
2
n 1
► The Variance, s2, is the sample variance, and is used to
estimate the actual population variance, s 2
2
(x )
i
2
N
COEFFICIENT OF VARIATION
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Is used to compare two or more sets of data measured in different units
S
CV 100%
X
SKEWNESS
When graphing the mean, median and mode of
a distribution, roughly speaking, a distribution
has positive skew if the right tail is longer and
negative skew if the left tail is longer.
SHAPE OF A DISTRIBUTION
Describes how data is distributed
Measures of shape
Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean
POSITIVELY SKEWED
This distribution has a positive skew. Note that the mean is larger than the median.
NEGATIVELY SKEWED
This distribution has a negative skew. The median is larger than the mean.