Z-Test and T-Test
Z-Test and T-Test
A statistical test used to determine whether two population means are different when the
variances are known and the sample size is large. The test statistic is assumed to have a
normal distribution and nuisance parameters such as standard deviation should be known
in order for an accurate z-test to be performed.
Z-test Vs T-test
Sometimes, measuring every single piece of item is just not practical. That is why we
developed and use statistical methods to solve problems. The most practical way to do it
is to measure just a sample of the population. Some methods test hypotheses by
comparison. The two of the more known statistical hypothesis test are the T-test and the
Z-test. Let us try to breakdown the two.
A T-test is a statistical hypothesis test. In such test, the test statistic follows a Students Tdistribution if the null hypothesis is true. The T-statistic was introduced by W.S. Gossett
under the pen name Student. The T-test is also referred as the Student T-test. It is
very likely that the T-test is most commonly used Statistical Data Analysis procedure for
hypothesis testing since it is straightforward and easy to use. Additionally, it is flexible
and adaptable to a broad range of circumstances.
There are various T-tests and two most commonly applied tests are the one-sample and
paired-sample T-tests. One-sample T-tests are used to compare a sample mean with the
known population mean. Two-sample T-tests, the other hand, are used to compare either
independent samples or dependent samples.
T-test is best applied, at least in theory, if you have a limited sample size (n < 30) as long
as the variables are approximately normally distributed and the variation of scores in the
two groups is not reliably different. It is also great if you do not know the populations
standard deviation. If the standard deviation is known, then, it would be best to use
another type of statistical test, the Z-test. The Z-test is also applied to compare sample
and population means to know if theres a significant difference between them. Z-tests
always use normal distribution and also ideally applied if the standard deviation is
known. Z-tests are often applied if the certain conditions are met; otherwise, other
statistical tests like T-tests are applied in substitute. Z-tests are often applied in large
samples (n > 30). When T-test is used in large samples, the t-test becomes very similar to
the Z-test. There are fluctuations that may occur in T-tests sample variances that do not
exist in Z-tests. Because of this, there are differences in both test results.
Summary:
1. Z-test is a statistical hypothesis test that follows a normal distribution while T-test
follows a Students T-distribution.
2. A T-test is appropriate when you are handling small samples (n < 30) while a Z-test is
appropriate when you are handling moderate to large samples (n > 30).
3. T-test is more adaptable than Z-test since Z-test will often require certain conditions to
be reliable. Additionally, T-test has many methods that will suit any need.
F-statistic is a value resulting from a standard statistical test used in ANOVA and
regression analysis to determine if the variances between the means of two populations
are significantly different. For practical purposes, it is important to know that this value
determines the P-value, but the F-statistic number will not actually be used in the
interpretation here.
Significance, or P-value, is the probability that an effect at least as extreme as the
current observation has occurred by chance. Therefore, in these particular examples the
chance that the prevalence of low waz dropped from 38% to 26% in for better roofing
groups and from 40% to 16% in groups with higher education is unlikely to have
occurred by chance. For the roofing example, P or Sig=0.031, 97 of every 100 times this
difference would not occur by chance alone. For the education example, P or Sig= 0.000,
there greater than 99.9% certainty that the difference did not occur by chance. In medical
research, if the P-value is less than or equal to 0.05, meaning that there is no more than a
5%, or a 1 in 20, probability of observing a result as extreme as that observed solely due
to chance, then the association between the exposure and disease is considered
statistically significant.
THE RELATIONSHIP BETWEEN CONFIDENCE INTERVALS AND SIGNIFICANCE
TESTS
A confidence interval is a range of population values with which the sample data are
compatible. A significance test considers the likelihood that the sample data has come
from a particular hypothesised population.
The 95% confidence interval consists of all values less than 1.96 standard errors away
from the sample value, testing against any population value in this interval will lead to p
> 0.05. Testing against values outside the 95% confidence interval (which are more than
1.96 standard errors away) will lead to p-values < 0.05.
Similarly, the 99% confidence interval consists of all values less than 2.58 standard
errors away from the sample value, testing against any hypothesised population value in
this interval will give a p-value > 0.01. Testing against values outside the 99% confidence
interval (which are more than 2.58 standard errors away) will lead to p-values < 0.01. In
general:-
Examples
1) The mean birthweight of 53 CMV infected babies was 3060.75g (standard deviation =
601.03g, standard error = 82.57g).
A 95% confidence interval for the population mean birthweight of CMV infected babies
is therefore given by:
(3060.75 1.96(82.57)g) = (2898.91, 3222.59g)
Similarly, the 99% confidence interval for the mean is:
(3060.75 2.58(82.57)g) = (2847.72,3273.78g)
We are 95% confident that the true mean is somewhere between 2898.91 and 3222.59g,
testing against values outside this range will lead to p-values < 0.05.
We are 99% confident that the true mean is between 2847.72 and 3273.78g (notice that
this is a wider interval, we are more confident that it contains the population mean).
Testing against values within this range will lead to p-values > 0.01.
The test given previously showed that the sample mean was significantly different from
a hypothesised population mean of 3263.57g. The p-value for that test was 0.014 and this
corresponds to the hypothesised population mean of 3263.57g lying outside the 95%
confidence interval but inside the 99%.
2) A sample of 33 boys with recurrent infections have their diastolic blood pressures
measured. Their mean blood pressure is 62.5 mmHg, standard deviation 8.2.
Using the sample standard deviation to estimate the population standard deviation,
samples of size 33 will be distributed with standard error
= 1.43 mmHg
Therefore, a 99% confidence interval for the mean diastolic blood pressure of boys with
recurrent infections is (62.5 2.58(1.43)) = (58.81,66.18mmHg).
We wish to know whether boys with recurrent infections are different from boys in
general who are known to have pressures of on average 58.5mmHg. The null hypothesis
to be tested is that the 33 boys come from a population with a mean dbp of 58.5mmHg.
The observed sample mean is 62.5 - 58.5/ 1.43 = 2.797 standard errors away from the
hypothesised mean of 58.5mmHg. Consulting the table of the normal distribution, we
find 0.002 < p < 0.01. Using the linked spreadsheet we get the exact p-value of 0.005, a
sample with a mean 4mmHg away from the hypothesised value would occur by chance
one time in 200 (5 in 1000).
The 99% confidence interval does not contain the hypothesised mean and p < 0.01 as
expected.