6. Samples From Populations; Statistical Significance for the Correlation Coefficient a Practical Introduction to Statistical Inference
6. Samples From Populations; Statistical Significance for the Correlation Coefficient a Practical Introduction to Statistical Inference
Behavioral Sciences I
Samples from populations / Statistical significance for the correlation
coefficient: A practical introduction to statistical inference
• The answer is quite a lot if we are prepared to infer information from our
sample. And we have little choice other than to do that since our sample is
all that we know about.
Samples from populations
• In statistical inference, it is generally assumed that samples are
drawn at random from the population.
• Such samples are called random samples from the population.
• A random sample of scores from a population entails selecting
scores in such a way that each score in the population has an
equal chance of being selected.
• In other words, a random sample favors the selection of no
particular scores in the population.
• Although it is not difficult to draw a random sample, it does
require a systematic approach.
Samples from populations
There are several ways of drawing a random sample:
• Put the information about each member of the population on a
slip of paper, put all the slips into a hat, close your eyes, give the
slips a long stir with your hand and finally bring one slip out of
the hat.
• This slip is the first member of the sample; repeat the process to
get the second, third and subsequent members of the sample.
• Technically the slip of paper should be returned to the container
after being selected so it may be selected again. However, this is
not done, largely because with a large population it would
make little difference to the outcome.
Samples from populations
There are several ways of drawing a random sample:
• Number each member of the population.
• Press the appropriate randomization button on your scientific
calculator to generate a random number.
• Apps to generate random numbers are downloadable for your PC,
tablet, or mobile to do the same thing.
Figure 10.1
Conceptual steps for understanding significance
testing
Table 10.1
Population of 100 scores
There is a population of 100 scores – the mode is 2, the median is 6.00 and the mean is 5.52.
Table 10.2
Means of 40 samples each of five scores taken
at random from the population in Table 10.1
We can calculate the (estimated) standard deviation of these 40 sample means on SPSS
which gives us a value of 1.6. The standard deviation of sample means has a technical
name, although the basic concept differs only in that it deals with means of samples and
not scores. The special term is standard error.
So, in general, it would seem that sample means are a pretty good estimate of population
means.
Table 10.3
Means of 40 samples each of size 20 taken at
random from the population in Table 10.1
Much the same trends appear with these larger samples but for the following:
● The spread of the sample means is reduced somewhat, and they appear to cluster
closer to the population mean. The minimum value is 4.25 and the maximum value is 6.85.
The overall mean of these samples is 5.33, close to the population mean of 5.52.
● The standard deviation of these means (i.e. the standard error) of larger samples is
smaller. For Table 10.3 the standard deviation is 0.60.
● The distribution of sample means is a steeper curve than for the smaller samples.
Samples from populations
• There is another idea that is fundamental to some branches of
statistics – confidence interval of the mean.
• The smaller the margin of error the more confident we should be
in the estimate of the population based on the sample.
• Confidence intervals (CIs) are similar in that they tell us the
range of means (and other things) which is likely to contain the
actual population mean 95% of the time.
• That is, if we repeatedly draw random samples from a population,
the confidence interval is the range of means likely to contain the
actual population mean 95% of the time.
Samples from populations
A little more jargon:
• The correct term for characteristics of samples such as their
means, standard deviations, ranges and so forth is statistics.
• The same characteristics of populations are called parameters.
• In other words, you use the statistics from samples to
estimate or infer the parameters of the population from
which the sample came.
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
• It is usual to report the statistical significance of correlation
coefficients and many other statistical techniques.
• Statistical significance merely indicates whether your statistical
findings are likely to be due to chance.
• Samples drawn randomly from a population usually have similar
characteristics to those of the population.
• However, some samples are unlike the population.
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
• Null hypothesis always states that there is no relation between
two variables. Significance testing assesses the validity of the
null hypothesis.
• If our data sample is in the middle 95% of samples if the null
hypothesis is true, we say that our findings are not statistically
significant at the 5% level, and we prefer the null hypothesis.
• However, if our data sample is in the extreme 5% of samples
assuming that the null hypothesis is true, our sample does not
seem to support the null hypothesis.
• In this case, we prefer the alternative hypothesis and reject the
null hypothesis. We also say that our findings are statistically
significant.
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
• Researchers have correlated two variables for a sample of 20
people.
• They obtained a correlation coefficient of .56.
• The problem is that they wish to generalize beyond this sample
and make statements about the trends in the data which apply
more widely.
• However, their analyses are based on just a small sample which
might not be characteristic of the trends in the population.
• What do they do?
Table 11.1
Imaginary population of 60 pairs of scores with
zero correlation between the pairs
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
• Table 11.1 contains the population of pairs of scores.
• Overall, the correlation between the two variables in this
population is .0.
• That is, there is absolutely no relationship between variable X
and variable Y in the population.
• What happens, though, if we draw many samples of, say, eight
pairs of scores at random from this population and calculate the
correlation coefficients for each sample?
Table 11.2
Two hundred correlation coefficients obtained by
repeatedly random sampling eight pairs of scores
from Table 11.1
Some of the In the table
correlation correlations are
coefficients are ones as large
indeed more-or- as .81 which would
less zero, but a delight most
few are researchers
substantially
different from But this correlation
So even where
zero. is really due to
there is zero chance and, in
relationship in the truth, there is no
population, correlation in the
random samples population.
can have
correlations
which depart
Figure 11.1
Distribution of correlation coefficients presented
in Table 11.2
Statistical significance
for the correlation
coefficient: A practical
introduction to
statistical inference
If the population
correlation is zero (if the
null hypothesis is true )
• The middle 95% of the
distribution of samples are
likely
• Correlations in the
extreme 5% (usually the
extreme 2.5% in each
direction) are unlikely in
these circumstances
Statistical significance
for the correlation
coefficient: A practical
introduction to
statistical inference
If the population
correlation is zero (if the
null hypothesis is true )
• The
correlations .81, .76, .72, .6
8 and .68 and -.80, -.72,
-.71, -.70 and -.69 are in
the extreme 5% of
correlations away from
zero.
• This extreme 5% is usually
made up of the extreme
2.5% positive correlations
and the extreme 2.5%
negative correlations.
Statistical significance
for the correlation
coefficient: A practical
introduction to
statistical inference
If the population
correlation is zero (if the
null hypothesis is true )
• Therefore, a correlation of
between .68 and 1.00 or
-.69 and -1.00 is in the
extreme 5% of
correlations in our
example.
• This range we describe as
statistically significant.
Statistical significance
for the correlation
coefficient: A practical
introduction to
statistical inference
If the population correlation
is zero (if the null
hypothesis is true )
• 5% of correlations in our
example.
• This range we describe as
statistically significant.
• Statistical significance simply
means that our sample falls
in the relatively extreme part
of the distribution of samples
obtained if the null
hypothesis of no
relationship between the two
variables is true.
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
• Hypotheses in psychological statistics are usually presented as
antithetical pairs – the null hypothesis and its corresponding
alternative hypothesis.
• The null hypothesis is essentially a statement that there is no
relationship between two variables.
The following are all examples of null hypotheses:
• There is no relationship between brain size and intelligence.
• There is no relationship between gender and income.
• There is no relationship between baldness and virility.
• There is no relationship between children’s self-esteem and that
of their parent of the same sex.
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
So, correlations which are smaller than the critical value are described as being statistically non-significant.
Significance Table 11.1
5% significance values of the Pearson correlation coefficient (two-
tailed test). An extended and conventional version of this table is
given in Appendix C
However, if the
correlation is equal Correlations equal
to or larger than to or larger than
the critical value the critical value
then it is in the are described as
extreme 5% of being
correlations. In this statistically
case the significant.
alternative
hypothesis is That is, we
accepted (that accept the
there is a alternative
relationship hypothesis that
Statistical significance
for the correlation
coefficient: A practical
introduction to
statistical inference
• Significance Table 11.1
indicates that for a sample
size of 10, a correlation has to
be between -.63 and -1.00 or
between .63 and 1.00 to be
sufficiently large as to be in
the extreme 5% of
correlations which support the
alternative hypothesis.
• Correlations closer to .00 than
these come in the middle
95%, which supports the null
hypothesis.
• So, our correlation of .94
based on a sample of 10 is
clearly statistically significant.
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
Figure 11.3
Type I and Type II errors
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
Significance Table 11.2
5% significance values of the significance correlation coefficient
(two-tailed test). Extended and conventional version of this table is
given in Appendix d
For Spearman’s
rho correlation
coefficient
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
Interpreting the results
Since our obtained value of the Spearman’s rho correlation
coefficient is in the range of significant correlations, we accept the
alternative hypothesis that mathematical and musical scores are
(inversely) related and reject the null hypothesis.
Reporting the results
We can report a significant correlation: ‘There is a negative
correlation of -.89 between mathematical and musical scores
which is statistically significant at the 5% level with a sample size
of 10.’ Alternatively, following the APA (2010) Publication Manual
recommendations we could write something like:
Statistical significance for the correlation coefficient:
A practical introduction to statistical inference
Mathematical scores were significantly negatively correlated with
musical scores, rs(8) = -.89, p <.05. The APA manual uses the
degrees of freedom which are given in brackets. The value of the
degrees of freedom will be the sample size minus 2 for Spearman’s
rho