Descriptive &
Inferential Statistics
Prepared by Dr. Ali Bavik & as adapted by
Shanshan
Learning Objectives
• What is SPSS?
• Basic Analysis
• Types of Statistics
– Descriptive analysis
• Central Tendency
– Mean; Median; Mode
• Normal Distribution
• Standard Deviation
2
SPSS
Statistical Package for the Social Sciences
We Can Analyse Data 3 Basic Ways
1) Descriptive Statistics
• Frequencies
• Minimum
• Maximum
• Mean
• Median
• Mode
• Standard Deviation
We Can Analyse Data 3 Basic Ways
2) Examine Relationships (Level of Association)
• Correlation
• Regression
3) Compare Groups/ Cause &Effect
• T-Test
• One-Way Anova
Types of Statistics
Descriptive Statistics
• Characterize the attributes of a set of
measurements
• Used to summarize data
• Used to explore patterns of variation
• Used to describe changes over time
Central Tendency
Measures of central tendency represent the
“ typical” attributes of the data.
Mean
The mean (M) is the arithmetic
average of a group of scores or sum of
scores divided by the number of scores
Mean
Median
The Median (Mdn) is the middle score of all the
scores in a distribution arranged from highest to
lowest.
It is the mid-point of distribution when the
distribution has an odd number of scores. It is the
number halfway between the two middle scores when
the distribution has an even number of scores
Median
Mode
The mode (Md) is the value with the
greatest frequency in the distribution
Mode
Central Tendency
Mode
Most Frequently Occurring Score
Median
Middle Score
Mean
Arithmetic Average
Levels of Measurement & the Best Measure of Central
Tendency
Example: Male / Female
Example: Likert scale type
Example: Likert scale type, Temperature
Example: Weight
Frequency Distribution
•The pattern of frequencies of the
observations or listing of case counts by
category
•10 students’ scores on a math test,
arranged in order from lowest to highest:
69, 77, 77, 77, 84, 85, 85, 87, 92, 98
Frequency Distribution
The frequency (f) of a particular data set is the
number of times a particular observation
occurs in the data
Frequency Table
A chart presenting statistical data that categorizes the
values along with the number of times each value
appears in the data set
69, 77, 77, 77, 84, 85, 85, 87, 92, 98
Mean
The mean (M) is the arithmetic average of a group of
scores or sum of scores divided by the number of scores
For example, in our distribution of 10 test scores,
69 + 77+ 77+77+84+ 85+ 85+ 87+92+ 98/ 10 = 83.1
Median
The median (Mdn) is the middle score of all the
scores in a distribution arranged from highest to
lowest
69, 77, 77, 77, 84, 84.5, 85, 85, 87, 92, 98
Mode
The mode (Md) is the value with the greatest
frequency in the distribution
For example, in our distribution of 10 test scores, 77
is the mode because it is observed most frequently
69, 77, 77, 77, 84, 85, 85, 87, 92, 98
Normal Distribution
Bell-shaped Curve
Total area =1
Symmetrical
50% of the values < the mean1& 50% > the mean
0
Standard deviation is a measure of the
spread of scores
That is, how spread out is the data set?
How much does the data
vary from the average?
EXAMPLE
Test Scores
69, 77, 77, 77, 84, 85, 85, 87, 92, 98
Low standard deviation indicates that the
data closely clustered around the mean
Most students have achieved close to the average
score with few achieving high or low scores
High standard deviation indicates that the data
dispersed over a wide range of values
Most students are very spread out from the mean with
individuals achieving very high or very low scores on the test
Mean
83.1
Set of Data
Standard
Deviation
Student test scores
69, 77, 77, 77, 84, 85, 85, 87, 92, 98 8.39
Student Score Example
69 + 77+ 77+77+84+ 85+ 85+ 87+92+ 98
Mean= 83.1 Standard Deviation: 8.39
Types of Statistics
Types of Statistics
Inferential Analysis
• Used to generate conclusions about the
population’s characteristics based on the sample
data
– Determine population parameters
– Test hypotheses E.g. null hypothesis, alternative
hypothesis
– That is, results are generalisable to the population
– Only possible when using a random sample
Types of Statistics
Differences Analysis:
Used to compare the mean of the responses of one
group to that of another group
– Determine if differences exist between
groups
– Evaluate statistical significance of difference
in the means of two groups in a sample
– E.g., T-test, Paired Samples T-test, One-way
ANOVA
Types of Statistics
Associative Analysis
Determines the strength & direction of relationships
between two or more variables
–Chi-square Analysis (Cross-Tabulation)
–Correlation
–Regression Analysis
–Multiple Regression Analysis
Types of Statistics
Predictive Analysis
Allows one to make forecasts for future events based on a
statistical model
• Estimate the level of Y, given the amount of X
• For Example -
Independent T-test
Paired Samples T-test
ANOVA
Regression Analysis
Determining the Test
Parameter Estimation
Parameter estimation involves three values:
1. Sample statistic (mean or percentage generated from sample data)
2. Standard error (variance divided by sample size; formula for
standard error of the mean and another formula for standard error of
the percentage)
3. Confidence interval (gives us a range within which a sample
statistic will fall if we were to repeat the study many times over)
– E.g., 95%, 99%
Parameter Estimation – Confidence
Interval
• Confidence intervals: the degree of accuracy desired by
the researcher and stipulated as a level of confidence in
the form of a percentage
• Most commonly used level of confidence: 95%;
corresponding to 1.96 standard errors
• Other levels of confidence:
– 90% (1.64 standard error)
– 99% (2.58 standard error)
Confidence Interval
What does this mean?
•It means that we can say that if we did our study over
100 times, we can determine a range within which
the sample statistic will fall 95 times out of 100 (95%
level of confidence)
•This gives us confidence that the real population
value falls within this range
Why Differences are Important?
• Market segmentation holds that within a market, there are different types
of consumers who have different requirements, and these differences can
be the bases of marketing strategies
• Some differences are obvious – differences between teens’ & baby
boomers’ music preferences
• Other differences are not so obvious and marketers who “discover” these
subtle differences may take advantage of huge gains in the marketplace.
Why Differences are Important
Market Segmentation
• Differences must be statistically significant
– Statistical significance of differences: the differences in the
sample(s) may be assumed to exist in the population(s) from which
the random samples are drawn
– Statistically significant differences should be demonstrated between
groups
• Differences must be meaningful
– Meaningful difference: one that the marketing manager can
potentially use as a basis for marketing decisions
– The outcome must be interpretable (reasonable)
• Makes sense to you
Why Differences are Important
Market Segmentation
• Differences should be stable
– Stable difference: one that will be in place for the foreseeable future
– The differences should not be short term or changed easily
• Differences must be actionable
– Actionable difference: the marketer can focus various marketing strategies
and tactics, such as advertising, on the market segments to accentuate the
differences between segments
– Example of segmentation bases that are actionable: demographics,
lifestyles, product benefits, usage, opinions, attitudes
Parametric Statistical Test Assumptions
(aside)
• Normality
– The populations from which samples are drawn are normally distributed
• Homogeneity of Variance
– In ANOVA and T-test the variances within the groups are statistically the same
• Continuity & Equal Intervals of Measures
– The dependent variables are continuous (interval or ratio scale) and the intervals
have equal distance
• Independence of Observations
– One observation does not influence the making of another observation (except for
repeated measure analyzes)
42
Determining Statistical Significance: The
‘p’ value
• Statistical tests generate some critical value usually identified
by some letter; i.e., z, t or F.
• Associated with the value will be a p value which stands for
probability of supporting the null hypothesis (no difference or
no association).
• If the probability of supporting the null hypothesis is low, say
0.05 or less, we have significance!
Determining Statistical Significance:
The ‘p’ value
• p value = probability of committing a type I error (α)
– i.e., the result is significant but in fact wrong
• p values are often identified in SPSS with abbreviations such as “Sig.”
or “Prob.”
• p values range from 0 to 1.0
• First, we MUST determine the amount of sampling error we are willing
to accept and still say the results are significant.
• Convention is 5% (α = 0.05), and this is known as the “alpha error”
– i.e., 1 - 0.05 = 0.95 (95% confidence interval)
Testing Differences: Percentages or
Means?
• There are statistical tests for when a researcher wants to
compare the means or percentages of 2 different groups or
samples
• Percentages are calculated for questions with nominal or
ordinal level of measurement
• Means are calculated for questions with interval or ratio
(metric level of measurement)
Testing the Difference Between Two
Groups (Mean Differences)
• Null hypothesis (H0): no difference between the
means being compared
– H0: µ1 = µ2 or µ1 - µ 2 = 0
• Alternative hypothesis (H1): a true difference
between the compared means
– H1: µ1 ≠ µ2
Golden Rule!!!
We have to decide whether to ‘Do Not’ Reject
or ‘Reject’ the Null Hypothesis
•If the p-value > 0.05 then Do Not Reject the Null
Hypothesis
•If the p-value < 0.05 then Reject the Null Hypothesis
47
How do you know when the results are
significant?
• If the null hypothesis is true we would expect there to be 0
differences between the two means
• Yet we know that, in any given study, differences may be
expected due to sampling error
• If the null hypothesis were true, we would expect 95% of the t-
ratio (t-value) computed from 100 samples to fall between +&-
1.96 standard errors
How do you know when the results are
significant?
• If the computed t value is greater than +/-1.96, it is not
likely that the null hypothesis of no difference is true
• Rather, it is likely that there is a real statistical
difference between the two means
49