Quantitative Chemistry
Quantitative Chemistry
Examples:
Three weights are measured at 1.05 g, 1.00 g, and 0.95 g. The absolute error is ± 0.05 g.
Examples: If a measurement is recorded to be 1.12 and the true value is known to be 1.00
then the absolute error is 1.12 - 1.00 = 0.12. If the mass of an object is measured three
times with values recorded to be 1.00 g, 0.95 g, and 1.05 g, then the absolute error could be
expressed as +/- 0.05 g.
1. Absolute uncertainty (AU) is a measure of uncertainty with the same units as the
reported value. For example, the grape's width is 12.3 ± 0.2 mm, where 0.2 mm is
the AU.
2. Relative uncertainty (RU) represents AU as a fraction or percentage.
For example, 0.2mm/12.3mm = 0.02 = 2%. The grape's width is 12.3 mm ± 2%,
where 2% is the RU.
Relative uncertainty is a fractional value. If you measure a pencil to be 10cm ± 1cm, then
the relative uncertainty is one tenth of its length (RU = 0.1 or 10%). RU is simply absolute
uncertainty divided by the measured value. It is reported as a fraction or percent:
AU = 0.01g
Notes:
When you perform calculations on numbers whose uncertainties are known, you can
determine the uncertainty in the calculated answer using two simple rules. This is known as
propagation of uncertainty. Rules for uncertainty propagation are very different for
addition/subtraction operations as compared to multiplication/division operations. These
rules are not interchangeable. The rules presented here determine the maximum possible
uncertainty.
When calculating uncertainty for the sum or difference of measured values, AU of the
calculated value is the sum of the absolute uncertainties of the individual terms.
Example:
A = 19 ± 4 (AU = 4; RU = 0.2)
B = 28 ± 3 (AU = 3; RU = 0.1)
A + B = 47 ± 7 (AU = 7; RU = 0.1)
A - B = -9 ± 7 (AU = 7; RU = 0.8)
Notes:
Calculate qtotal and its associated AU and RU values, using the equation:
qsolution = 1450 ± 20 J
qcal = 320 ± 50 J
Solution:
When calculating uncertainty for the product or ratio of measured values, RU of the
calculated value is the sum of the relative uncertainties of the individual
terms. However, AU cannot be calculated directly from AU or RU values of the
measured values.
Example:
A = 19 ± 4 (AU = 4; RU = 0.2)
B = 28 ± 3 (AU = 3; RU = 0.1)
A x B = 532
Notes:
Example:
Calculate qcal and its associated AU and RU values, using the equation:
qcal = CΔT
C = (54 ± 7) J/°C
ΔT = 6.0 ± 0.1 °C
Solution:
Example:
Calculate qtotal and its associated AU and RU values, using the equation:
C = 54 ± 7 J/°C
ΔT = 6.0 ± 0.1 °C
Solution:
Note that you did the multiplication first, then the addition. We will approach
the uncertainty calculations in the same order.
normal distribution, also called Gaussian distribution, the most common distribution
function for independent, randomly generated variables. Its familiar bell-shaped curve is
ubiquitous in statistical reports, from survey analysis and quality control to resource
allocation.
The graph of the normal distribution is characterized by two parameters: the mean, or
average, which is the maximum of the graph and about which the graph is always
symmetric; and thestandard deviation, which determines the amount of dispersion away
from the mean. A small standard deviation (compared with the mean) produces a steep
graph, whereas a large standard deviation (again compared with the mean) produces a flat
The term “Gaussian distribution” refers to the German mathematician Carl Friedrich Gauss,
who first developed a two-parameter exponential function in 1809 in connection with studies
of astronomical observation errors. This study led Gauss to formulate his law of
observational error and to advance the theory of the method of least squares
approximation. Another famous early application of the normal distribution was by the
British physicist James Clerk Maxwell, who in 1859 formulated his law of distribution of
molecular velocities—later generalized as the Maxwell-Boltzmann distribution law.
The French mathematician Abraham de Moivre, in his Doctrine of Chances (1718), first noted
that probabilities associated with discretely generated random variables (such as are
obtained by flipping a coin or rolling a die) can be approximated by the area under the graph
of an exponential function. This result was extended and generalized by the French
scientist Pierre-Simon Laplace, in his Théorie analytique des probabilités (1812; “Analytic
Theory of Probability”), into the firstcentral limit theorem, which proved that probabilities for
almost all independent and identically distributed random variables converge rapidly (with
sample size) to the area under an exponential function—that is, to a normal distribution. The
central limit theorem permitted hitherto intractable problems, particularly those involving
discrete variables, to be handled with calculus.
The common notation for the parameter in question is . Often, this parameter is the
The level C of a confidence interval gives the probability that the interval produced by the
Example
Suppose a student measuring the boiling temperature of a certain liquid observes the
readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different
samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the
standard deviation for this procedure is 1.2 degrees, what is the confidence interval for the
population mean at a 95% confidence level?
In other words, the student wishes to estimate the true mean boiling temperature of the
liquid using the results of his measurements. If the measurements follow a normal
distribution, then the sample mean will have the distribution N( , ). Since the sample
size is 6, the standard deviation of the sample mean is equal to 1.2/sqrt(6) = 0.49.
Student's t-Tests
"Student" (real name: W. S. Gossett [1876-1937]) developed statistical methods to solve
problems stemming from his employment in a brewery. Student's t-test deals with the
problems associated with inference based on "small" samples: the calculated mean (Xavg)
and standard deviation ( ) may by chance deviate from the "real" mean and standard
deviation (i.e., what you'd measure if you had many more data items: a "large" sample). For
example, it it likely that the true mean size of maple leaves is "close" to the mean calculated
from a sample of N randomly collected leaves. If N=5, 95% of the time the actual mean
would be in the range: Xavg± 2.776 /N1/2 ; if N=10: Xavg± 2.262 /N1/2 ; if N=20: Xavg±
2.093 /N1/2 ; if N=40; Xavg± 2.023 /N1/2 ; and for "large" N: Xavg± 1.960 /N1/2 . (These
"small-sample" corrections are included in the descriptive statics report of the 95%
confidence interval.)
Use this test to compare two small sets of quantitative data when samples are collected
independently of one another. When one randomly takes replicate measurements from a
population he/she is collecting an independent sample. Use of a paired t test, to which some
statistics programs unfortunately default, requires nonrandom sampling (see below).
Criteria
Only if there is a direct relationship between each specific data point in the first set
and one and only one specific data point in the second set, such as measurements on
the same subject 'before and after,' then the paired t test MAY be appropriate.
If samples are collected from two different populations or from randomly selected
individuals from the same population at different times, use the test for independent
samples (unpaired).
Here's a simple check to determine if the paired t test can apply - if one sample can
have a different number of data points from the other, then the paired t
test cannot apply.
Examples
'Student's' t Test is one of the most commonly used techniques for testing a hypothesis on
the basis of a difference between sample means. Explained in layman's terms, the t test
determines a probability that two populations are the same with respect to the variable
tested.
For example, suppose you collected data on the heights of male basketball and football
players, and compared the sample means using the t test. A probability of 0.4 would mean
that there is a 40% liklihood that you cannot distinguish a group of basketball players from a
group of football players by height alone. That's about as far as the t test or any statistical
test, for that matter, can take you. If you calculate a probability of 0.05 or less, then you
canreject the null hypothesis (that is, you can conclude that the two groups of athletes can
be distinguished by height.
To the extent that there is a small probability that you are wrong, you haven't proven a
difference, though. There are differences among popular, mathematical, philosophical, legal,
and scientific definitions of proof. I will argue that there is no such thing as scientific proof.
Please see my essay on that subject. Don't make the error of reporting your results as proof
(or disproof) of a hypothesis. No experiment is perfect, and proof in the strictest sense
requires perfection.
Make sure you understand the concepts of experimental error and single variable
statistics before you go through this part. Leaves were collected from wax-leaf ligustrum
grown in shade and in full sun. The thickness in micrometers of the palisade layer was
recorded for each type of leaf. Thicknesses of 7 sun leaves were reported as: 150, 100, 210,
300, 200, 210, and 300, respectively. Thicknesses of 7 shade leaves were reported as 120,
125, 160, 130, 200, 170, and 200, respectively. The mean ± standard deviation for sun
leaves was 210 ± 73 micrometers and for shade leaves it was158 ± 34 micrometers. Note
that since all data were rounded to the nearest micrometer, it is inappropriate to include
decimal places in either the mean or standard deviation.
For the t test for independent samples you do not have to have the same number of data
points in each group. We have to assume that the population follows a normal distribution
(small samples have more scatter and follow what is called a t distribution). Corrections can
be made for groups that do not show a normal distribution (skewed samples, for example -
note that the word 'skew' has a specific statistical meaning, so don't use it as a synonym for
'messed up').
The t test can be performed knowing just the means, standard deviation, and number of
data points. Note that the raw data must be used for the t test or any statistical test, for that
matter. If you record only means in your notebook, you lose a great deal of information and
usually render your work invalid. The two sample t test yields a statistic t, in which
X-bar, of course, is the sample mean, and s is the sample standard deviation. Note that the
numerator of the formula is the difference between means. The denominator is a
measurement of experimental error in the two groups combined. The wider the difference
between means, the more confident you are in the data. The more experimental error you
have, the less confident you are in the data. Thus the higher the value of t, the greater the
confidence that there is a difference.
To understand how a precise probability value can be attached to that confidence you need
to study the mathematics behind the t distribution in a formal statistics course. The value t
is just an intermediate statistic. Probability tables have been prepared based on the t
distribution originally worked out by W.S. Gossett (see below). To use the table provided,
find the critical value that corrresponds to the number of degrees of freedom you have
(degrees of freedom = number of data points in the two groups combined, minus 2). If t
exceeds the tabled value, the means are significantly different at the probability level that is
listed. When using tables report the lowest probability value for which t exceeds the critical
value. Report as 'p < (probability value).'
In the example, the difference between means is 52, A = 14/49, and B = 3242.5. Then t =
1.71 (rounding up). There are (7 + 7 -2) = 12 degrees of freedom, so the critical value for p
= 0.05 is 2.18. 1.71 is less than 2.18, so we cannot reject the null hypothesis that the two
populations have the same palisade layer thickness. So now what? If the question is very
important to you, you might collect more data. With a well designed experiment, sufficient
data can overcome the uncertainty contributed by experimental error, and yield a significant
difference between samples, if one exists.
If you have lots of data and the probability value becomes smaller but still does not reach
the 'magic' number 0.05, should you keep collecting data until it does? At this point,
consider the biological significance of the question. If you did find adifference of 0.1%
between palisade layers of sun and shade leaves respectively, just how important could it
be?
When reporting results of a statistical analysis, always identify what data sets you
compared, what test was used, and for most quantitative data report mean, standard
deviation, and the probability values. Make sure the outcome of the analysis is clearly
reported. Some spreadsheet programs include the t test for independent variables as a built-
in option. Even without a built-in option, is is so easy to set up a spreadsheet to do a paired t
test that it may not be worth the expense and effort to buy and learn a dedicated statistics
software program, unless more complicated statistics are needed.
The method of least squares assumes that the best-fit curve of a given type is the curve that
has the minimal sum of the deviations squared (least square error) from a given set of data.
Suppose that the data points are , , ..., where is the independent
variable and is the dependent variable. The fitting curve has the deviation (error)
The Q-test is a simple statistical test to determine if a data point that is very different from
the other data points in a set can be rejected. Only one data point may be discarded using
the Q-test.
N QC
3 0.94
4 0.76
5 0.64
6 0.56
7 0.51
8 0.47
9 0.44
10 0.41
The Method of Least Squares is a procedure to determine the best fit line to data; the
proof uses simple calculus and linear algebra. The basic problem is to find the best fit
straight line y = ax + b given that, for n ∈ {1, . . . , N }, the pairs (xn , yn ) are observed.
y = a1 f1 (x) + · · · + cK fK (x);
(0.1)
it is not necessary for the functions fk to be linearly in x – all that is needed is that y is to
Q test
In some groups of five replicates, one value can be rejected. There is a test for this called
the Q test, which is valid on samples of 3 to 10 replicates. To perform the Q test, calculate
the quantity Q, which is the ratio of [the difference between the value under suspicion and
the next closest value] to [the difference between the highest and lowest value in the
series]. Compare Q with the critical value for Q (below), which for 5 observations is 0.64. If
Q is greater than 0.64, the suspect measurement may be rejected. Otherwise, it must be
retained.
For example, for values 1,2,3,4,9, if we wanted to test "9", we would take (9-4)/(9-1)=5/8
=0.625, which is less than 0.64, so we'd have to keep the 9.
On the other hand, for the values, 3,3,4,4,9, if we wanted to test "9", we would take (9-4)/(9-
3)=5/6 =0.833, which is greater than 0.64, so we can throw the 9 out.
Only one value in a small (defined as 3-10 values) group can be removed by this test. If you
have more than one "wild" value, you just have a group of data with a lot of scatter, and you
need to keep them all.
We perform the Q test on the total amounts of each lipid class, then eliminate a sample from
our analysis of that the lipid species in that class if it doesn't meet the test.
Qc (90%) (Critical values of Q) for sample sizes of 3-10 (taken from Shoemaker, J.P., Garland,
C.W. and Steinfeld, J.I., "Experiments in Physical Chemistry", McGraw-Hill, Inc., 1974, pp. 34-
39; this is also the reference for information about the Q test):
3 0.94
4 0.76
5 0.64
6 0.56
7 0.51
8 0.47
9 0.44
10 0.41
Stats: F-Test
F-Test
The F-test is designed to test if two population variances are equal. It does this by
comparing the ratio of two variances. So, if the variances are equal, the ratio of the
variances will be 1.
All hypothesis testing is done under the assumption the null hypothesis is true
If the null hypothesis is true, then the F test-statistic given above can be
simplified (dramatically). This ratio of sample variances will be test statistic
used. If the null hypothesis is false, then we will reject the null hypothesis that
the ratio was equal to 1 and our assumption that they were equal.
There are several different F-tables. Each one has a different level of significance. So, find
the correct level of significance first, and then look up the numerator degrees of freedom
and the denominator degrees of freedom to find the critical value.
You will notice that all of the tables only give level of significance for right tail tests. Because
the F distribution is not symmetric, and there are no negative values, you may not simply
take the opposite of the right critical value to find the left critical value. The way to find a
left critical value is to reverse the degrees of freedom, look up the right critical value, and
then take the reciprocal of this value. For example, the critical value with 0.05 on the left
with 12 numerator and 15 denominator degrees of freedom is found of taking the reciprocal
of the critical value with 0.05 on the right with 15 numerator and 12 denominator degrees of
freedom.
Since the left critical values are a pain to calculate, they are often avoided altogether. This is
the procedure followed in the textbook. You can force the F test into a right tail test by
placing the sample with the large variance in the numerator and the smaller variance in the
denominator. It does not matter which sample has the larger sample size, only which sample
has the larger variance.
The numerator degrees of freedom will be the degrees of freedom for whichever sample has
the larger variance (since it is in the numerator) and the denominator degrees of freedom
will be the degrees of freedom for whichever sample has the smaller variance (since it is in
the denominator).
If a two-tail test is being conducted, you still have to divide alpha by 2, but you only look up
and compare the right critical value.
Assumptions / Notes