0% found this document useful (0 votes)
20 views13 pages

Quantitative Chemistry

Quantitative Chemistry

Uploaded by

aesabangan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Quantitative Chemistry

Quantitative Chemistry

Uploaded by

aesabangan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Definition: Relative uncertainty is a measure of the uncertainty of measurement

compared to the size of the measurement.

Also Known As: relative error

Examples:

Three weights are measured at 1.05 g, 1.00 g, and 0.95 g. The absolute error is ± 0.05 g.

The relative error is 0.05 g/1.00 g = 0.05 or 5%.

Definition: Absolute error or absolute uncertainty is the uncertainty in a measurement,


which is expressed using the relevant units. Also, absolute error may be used to express the
inaccuracy in a measurement.

Examples: If a measurement is recorded to be 1.12 and the true value is known to be 1.00
then the absolute error is 1.12 - 1.00 = 0.12. If the mass of an object is measured three
times with values recorded to be 1.00 g, 0.95 g, and 1.05 g, then the absolute error could be
expressed as +/- 0.05 g.

1. Absolute uncertainty (AU) is a measure of uncertainty with the same units as the
reported value. For example, the grape's width is 12.3 ± 0.2 mm, where 0.2 mm is
the AU.
2. Relative uncertainty (RU) represents AU as a fraction or percentage.

For example, 0.2mm/12.3mm = 0.02 = 2%. The grape's width is 12.3 mm ± 2%,
where 2% is the RU.

Relative uncertainty (RU)

Relative uncertainty is a fractional value. If you measure a pencil to be 10cm ± 1cm, then
the relative uncertainty is one tenth of its length (RU = 0.1 or 10%). RU is simply absolute
uncertainty divided by the measured value. It is reported as a fraction or percent:

For the example given under AU:

meas = (23.27 ± 0.01) g

AU = 0.01g

Notes:

 RUs can be represented as either a fractional value or a percent.


 RUs have no units.
Propagation of Uncertainty

When you perform calculations on numbers whose uncertainties are known, you can
determine the uncertainty in the calculated answer using two simple rules. This is known as
propagation of uncertainty. Rules for uncertainty propagation are very different for
addition/subtraction operations as compared to multiplication/division operations. These
rules are not interchangeable. The rules presented here determine the maximum possible
uncertainty.

A. Addition and Subtraction: AU = ΣAU

When calculating uncertainty for the sum or difference of measured values, AU of the
calculated value is the sum of the absolute uncertainties of the individual terms.

Example:

A = 19 ± 4 (AU = 4; RU = 0.2)

B = 28 ± 3 (AU = 3; RU = 0.1)

A + B = 47 ± 7 (AU = 7; RU = 0.1)

A - B = -9 ± 7 (AU = 7; RU = 0.8)

Notes:

o RUA+B ≠ RUA + RUB.


o RU can be calculated using the equation RU = AU/|value|.
o Even if you are subtracting measured values, be sure to add AUs.

Example: (underlines are used to indicate significant digits)

Calculate qtotal and its associated AU and RU values, using the equation:

qtotal = - (qsolution + qcal)

where qsolution and qcal are measured values:

qsolution = 1450 ± 20 J

qcal = 320 ± 50 J

Solution:

4. Calculate qtotal, ignoring uncertainties:

qtotal = - (1450 + 320) J = -1770 J

5. Add absolute uncertainties:


AUqtotal = AUqsolution + AUqcal = 20 + 50 J = 70 J

6. Calculate relative uncertainty from absolute uncertainty:

RUqtotal = AU/|(qtotal)| = 70J/|-1770J| = 0.04

7. Report your final answer::

qtotal = -1770 ± 70 J (RU = 4%)

Multiplication and Division: RU =ΣRU

When calculating uncertainty for the product or ratio of measured values, RU of the
calculated value is the sum of the relative uncertainties of the individual
terms. However, AU cannot be calculated directly from AU or RU values of the
measured values.

Example:

A = 19 ± 4 (AU = 4; RU = 0.2)

B = 28 ± 3 (AU = 3; RU = 0.1)

A x B = 532

RUAxB = RUA + RUB = 4/19 + 3/28 = 0.3177 = 0.3

AUAxB = RUx|532| = 0.3177|532| = 169 = 200

A x B = 532 ± 169 = 500 ± 200

Notes:

o To determine the correct number of significant figures in your answer,


calculate the absolute uncertainty, round it to one significant figure, and then
round the calculated value to the same digit.
o AUAxB ≠ AUA + AUB.
o AU can be calculated using the equation AU = RUx|value|.
o Relative uncertainties are always positive. Be sure to calculate RU using
absolute values.

Example:

Calculate qcal and its associated AU and RU values, using the equation:

qcal = CΔT

where C and ΔT are measured values:

C = (54 ± 7) J/°C
ΔT = 6.0 ± 0.1 °C

Solution:

5. Calculate qcal, ignoring uncertainties:

qcal = (54 J/°C) x (6.0 °C) = 324 J = 320 J

6. Add relative uncertainties:

RUqcal = RUC + RUΔT = 7/54 + 0.1/6.0 = 0.14630 = 0.1

7. Calculate absolute uncertainty from relative uncertainty:

AUqcal = RUx|qcal| = 0.14630 x 324 J = 47.4 J = 50 J

8. Report your final answer:

qcal = 320 ± 50 J (RU = 10%)

Combination of addition/subtraction with multiplication/division

For a combination of these operations we follow the standard order of operations.

Example:

Calculate qtotal and its associated AU and RU values, using the equation:

qtotal = - (qsolution+ qcal) = - (qsolution + CΔT)

where qsolution = 1450 ± 20 J

C = 54 ± 7 J/°C

ΔT = 6.0 ± 0.1 °C

Solution:

1. Calculate qtotal, ignoring uncertainties:

qtotal = - (1450 J + (54 J/°C)(6.0 °C)) = - (1450 J + 324 J) = -1774 J = -1770 J

Note that you did the multiplication first, then the addition. We will approach
the uncertainty calculations in the same order.

2. Determine uncertainties for the multiplication:

RUqcal = RUC + RUΔT = 7/54 + 0.1/6.0 = 0.14630 = 0.1


AUqcal = RUx|qcal| = 0.14630 x 324 J = 47.4 J = 50 J

3. Determine uncertainties for the addition:

AUqtotal = AUqsolution + AUqcal = 20 + 47.4 = 67.4 J= 70 J

RUqtotal = AU/|(qtotal)| = 67.4J/|-1774J| = 0.04

4. Report your final answer:

qtotal = -1770 ± 70 J (RU = 4%)

normal distribution, also called Gaussian distribution, the most common distribution
function for independent, randomly generated variables. Its familiar bell-shaped curve is
ubiquitous in statistical reports, from survey analysis and quality control to resource
allocation.

The graph of the normal distribution is characterized by two parameters: the mean, or
average, which is the maximum of the graph and about which the graph is always
symmetric; and thestandard deviation, which determines the amount of dispersion away
from the mean. A small standard deviation (compared with the mean) produces a steep
graph, whereas a large standard deviation (again compared with the mean) produces a flat

graph. See the figure.


The normal distribution is produced by the normal density function, p(x) = e−(x − μ)2
/2σ2/σ√2π. In
this exponential function e is the constant 2.71828…, is the mean, and σ is the standard
deviation. The probability of a random variable falling within any given range of values is
equal to the proportion of the area enclosed under the function’s graph between the given
values and above the x-axis. Because the denominator (σ√2π), known as the normalizing
coefficient, causes the total area enclosed by the graph to be exactly equal to unity,
probabilities can be obtained directly from the corresponding area—i.e., an area of 0.5
corresponds to a probability of 0.5. Although these areas can be determined with calculus,
tables were generated in the 19th century for the special case of = 0 and σ = 1, known as
the standard normal distribution, and these tables can be used for any normal distribution
after the variables are suitably rescaled by subtracting their mean and dividing by their
standard deviation, (x − μ)/σ. Calculators have now all but eliminated the use of such tables.

The term “Gaussian distribution” refers to the German mathematician Carl Friedrich Gauss,
who first developed a two-parameter exponential function in 1809 in connection with studies
of astronomical observation errors. This study led Gauss to formulate his law of
observational error and to advance the theory of the method of least squares
approximation. Another famous early application of the normal distribution was by the
British physicist James Clerk Maxwell, who in 1859 formulated his law of distribution of
molecular velocities—later generalized as the Maxwell-Boltzmann distribution law.

The French mathematician Abraham de Moivre, in his Doctrine of Chances (1718), first noted
that probabilities associated with discretely generated random variables (such as are
obtained by flipping a coin or rolling a die) can be approximated by the area under the graph
of an exponential function. This result was extended and generalized by the French
scientist Pierre-Simon Laplace, in his Théorie analytique des probabilités (1812; “Analytic
Theory of Probability”), into the firstcentral limit theorem, which proved that probabilities for
almost all independent and identically distributed random variables converge rapidly (with
sample size) to the area under an exponential function—that is, to a normal distribution. The
central limit theorem permitted hitherto intractable problems, particularly those involving
discrete variables, to be handled with calculus.

A confidence interval gives an estimated range of values which is likely to include an


unknown population parameter, the estimated range being calculated from a given set of
sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary
v1.1)

The common notation for the parameter in question is . Often, this parameter is the

population mean , which is estimated through the sample mean .

The level C of a confidence interval gives the probability that the interval produced by the

method employed includes the true value of the parameter .

Example

Suppose a student measuring the boiling temperature of a certain liquid observes the
readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different
samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the
standard deviation for this procedure is 1.2 degrees, what is the confidence interval for the
population mean at a 95% confidence level?

In other words, the student wishes to estimate the true mean boiling temperature of the
liquid using the results of his measurements. If the measurements follow a normal

distribution, then the sample mean will have the distribution N( , ). Since the sample
size is 6, the standard deviation of the sample mean is equal to 1.2/sqrt(6) = 0.49.

Student's t-Tests
"Student" (real name: W. S. Gossett [1876-1937]) developed statistical methods to solve
problems stemming from his employment in a brewery. Student's t-test deals with the
problems associated with inference based on "small" samples: the calculated mean (Xavg)
and standard deviation ( ) may by chance deviate from the "real" mean and standard
deviation (i.e., what you'd measure if you had many more data items: a "large" sample). For
example, it it likely that the true mean size of maple leaves is "close" to the mean calculated
from a sample of N randomly collected leaves. If N=5, 95% of the time the actual mean
would be in the range: Xavg± 2.776 /N1/2 ; if N=10: Xavg± 2.262 /N1/2 ; if N=20: Xavg±
2.093 /N1/2 ; if N=40; Xavg± 2.023 /N1/2 ; and for "large" N: Xavg± 1.960 /N1/2 . (These
"small-sample" corrections are included in the descriptive statics report of the 95%
confidence interval.)

'Student's' t Test (For Independent Samples)

Use this test to compare two small sets of quantitative data when samples are collected
independently of one another. When one randomly takes replicate measurements from a
population he/she is collecting an independent sample. Use of a paired t test, to which some
statistics programs unfortunately default, requires nonrandom sampling (see below).

Criteria

 Only if there is a direct relationship between each specific data point in the first set
and one and only one specific data point in the second set, such as measurements on
the same subject 'before and after,' then the paired t test MAY be appropriate.
 If samples are collected from two different populations or from randomly selected
individuals from the same population at different times, use the test for independent
samples (unpaired).
 Here's a simple check to determine if the paired t test can apply - if one sample can
have a different number of data points from the other, then the paired t
test cannot apply.

Examples

'Student's' t Test is one of the most commonly used techniques for testing a hypothesis on
the basis of a difference between sample means. Explained in layman's terms, the t test
determines a probability that two populations are the same with respect to the variable
tested.

For example, suppose you collected data on the heights of male basketball and football
players, and compared the sample means using the t test. A probability of 0.4 would mean
that there is a 40% liklihood that you cannot distinguish a group of basketball players from a
group of football players by height alone. That's about as far as the t test or any statistical
test, for that matter, can take you. If you calculate a probability of 0.05 or less, then you
canreject the null hypothesis (that is, you can conclude that the two groups of athletes can
be distinguished by height.

To the extent that there is a small probability that you are wrong, you haven't proven a
difference, though. There are differences among popular, mathematical, philosophical, legal,
and scientific definitions of proof. I will argue that there is no such thing as scientific proof.
Please see my essay on that subject. Don't make the error of reporting your results as proof
(or disproof) of a hypothesis. No experiment is perfect, and proof in the strictest sense
requires perfection.
Make sure you understand the concepts of experimental error and single variable
statistics before you go through this part. Leaves were collected from wax-leaf ligustrum
grown in shade and in full sun. The thickness in micrometers of the palisade layer was
recorded for each type of leaf. Thicknesses of 7 sun leaves were reported as: 150, 100, 210,
300, 200, 210, and 300, respectively. Thicknesses of 7 shade leaves were reported as 120,
125, 160, 130, 200, 170, and 200, respectively. The mean ± standard deviation for sun
leaves was 210 ± 73 micrometers and for shade leaves it was158 ± 34 micrometers. Note
that since all data were rounded to the nearest micrometer, it is inappropriate to include
decimal places in either the mean or standard deviation.

For the t test for independent samples you do not have to have the same number of data
points in each group. We have to assume that the population follows a normal distribution
(small samples have more scatter and follow what is called a t distribution). Corrections can
be made for groups that do not show a normal distribution (skewed samples, for example -
note that the word 'skew' has a specific statistical meaning, so don't use it as a synonym for
'messed up').

The t test can be performed knowing just the means, standard deviation, and number of
data points. Note that the raw data must be used for the t test or any statistical test, for that
matter. If you record only means in your notebook, you lose a great deal of information and
usually render your work invalid. The two sample t test yields a statistic t, in which

X-bar, of course, is the sample mean, and s is the sample standard deviation. Note that the
numerator of the formula is the difference between means. The denominator is a
measurement of experimental error in the two groups combined. The wider the difference
between means, the more confident you are in the data. The more experimental error you
have, the less confident you are in the data. Thus the higher the value of t, the greater the
confidence that there is a difference.

To understand how a precise probability value can be attached to that confidence you need
to study the mathematics behind the t distribution in a formal statistics course. The value t
is just an intermediate statistic. Probability tables have been prepared based on the t
distribution originally worked out by W.S. Gossett (see below). To use the table provided,
find the critical value that corrresponds to the number of degrees of freedom you have
(degrees of freedom = number of data points in the two groups combined, minus 2). If t
exceeds the tabled value, the means are significantly different at the probability level that is
listed. When using tables report the lowest probability value for which t exceeds the critical
value. Report as 'p < (probability value).'

In the example, the difference between means is 52, A = 14/49, and B = 3242.5. Then t =
1.71 (rounding up). There are (7 + 7 -2) = 12 degrees of freedom, so the critical value for p
= 0.05 is 2.18. 1.71 is less than 2.18, so we cannot reject the null hypothesis that the two
populations have the same palisade layer thickness. So now what? If the question is very
important to you, you might collect more data. With a well designed experiment, sufficient
data can overcome the uncertainty contributed by experimental error, and yield a significant
difference between samples, if one exists.

If you have lots of data and the probability value becomes smaller but still does not reach
the 'magic' number 0.05, should you keep collecting data until it does? At this point,
consider the biological significance of the question. If you did find adifference of 0.1%
between palisade layers of sun and shade leaves respectively, just how important could it
be?

When reporting results of a statistical analysis, always identify what data sets you
compared, what test was used, and for most quantitative data report mean, standard
deviation, and the probability values. Make sure the outcome of the analysis is clearly
reported. Some spreadsheet programs include the t test for independent variables as a built-
in option. Even without a built-in option, is is so easy to set up a spreadsheet to do a paired t
test that it may not be worth the expense and effort to buy and learn a dedicated statistics
software program, unless more complicated statistics are needed.

The Method of Least Squares

The method of least squares assumes that the best-fit curve of a given type is the curve that
has the minimal sum of the deviations squared (least square error) from a given set of data.

Suppose that the data points are , , ..., where is the independent
variable and is the dependent variable. The fitting curve has the deviation (error)

from each data point, i.e., , , ..., . According


to the method of least squares, the best fitting curve has the property that:

The Q-test is a simple statistical test to determine if a data point that is very different from
the other data points in a set can be rejected. Only one data point may be discarded using
the Q-test.

Q = |outlier - value closest to the outlier| / |highest value - lowest value|

Table of Q critical values (90% confidence)

N QC
3 0.94
4 0.76
5 0.64
6 0.56
7 0.51
8 0.47
9 0.44
10 0.41

If Q is larger than QC the outlier can be discarded with 90% confidence.

The Method of Least Squares is a procedure to determine the best fit line to data; the

proof uses simple calculus and linear algebra. The basic problem is to find the best fit

straight line y = ax + b given that, for n ∈ {1, . . . , N }, the pairs (xn , yn ) are observed.

The method easily generalizes to finding the best fit of the form

y = a1 f1 (x) + · · · + cK fK (x);

(0.1)

it is not necessary for the functions fk to be linearly in x – all that is needed is that y is to

be a linear combination of these functins.

Q test

In some groups of five replicates, one value can be rejected. There is a test for this called
the Q test, which is valid on samples of 3 to 10 replicates. To perform the Q test, calculate
the quantity Q, which is the ratio of [the difference between the value under suspicion and
the next closest value] to [the difference between the highest and lowest value in the
series]. Compare Q with the critical value for Q (below), which for 5 observations is 0.64. If
Q is greater than 0.64, the suspect measurement may be rejected. Otherwise, it must be
retained.

For example, for values 1,2,3,4,9, if we wanted to test "9", we would take (9-4)/(9-1)=5/8
=0.625, which is less than 0.64, so we'd have to keep the 9.

On the other hand, for the values, 3,3,4,4,9, if we wanted to test "9", we would take (9-4)/(9-
3)=5/6 =0.833, which is greater than 0.64, so we can throw the 9 out.

Only one value in a small (defined as 3-10 values) group can be removed by this test. If you
have more than one "wild" value, you just have a group of data with a lot of scatter, and you
need to keep them all.

We perform the Q test on the total amounts of each lipid class, then eliminate a sample from
our analysis of that the lipid species in that class if it doesn't meet the test.

Qc (90%) (Critical values of Q) for sample sizes of 3-10 (taken from Shoemaker, J.P., Garland,
C.W. and Steinfeld, J.I., "Experiments in Physical Chemistry", McGraw-Hill, Inc., 1974, pp. 34-
39; this is also the reference for information about the Q test):
3 0.94

4 0.76

5 0.64

6 0.56

7 0.51

8 0.47

9 0.44

10 0.41

Stats: F-Test

The F-distribution is formed by the ratio of two independent chi-


square variables divided by their respective degrees of freedom.

Since F is formed by chi-square, many of the chi-square properties


carry over to the F distribution.

 The F-values are all non-negative


 The distribution is non-symmetric
 The mean is approximately 1
 There are two independent degrees of freedom, one for the numerator, and one for
the denominator.
 There are many different F distributions, one for each pair of degrees of freedom.

F-Test

The F-test is designed to test if two population variances are equal. It does this by
comparing the ratio of two variances. So, if the variances are equal, the ratio of the
variances will be 1.

All hypothesis testing is done under the assumption the null hypothesis is true

If the null hypothesis is true, then the F test-statistic given above can be
simplified (dramatically). This ratio of sample variances will be test statistic
used. If the null hypothesis is false, then we will reject the null hypothesis that
the ratio was equal to 1 and our assumption that they were equal.
There are several different F-tables. Each one has a different level of significance. So, find
the correct level of significance first, and then look up the numerator degrees of freedom
and the denominator degrees of freedom to find the critical value.

You will notice that all of the tables only give level of significance for right tail tests. Because
the F distribution is not symmetric, and there are no negative values, you may not simply
take the opposite of the right critical value to find the left critical value. The way to find a
left critical value is to reverse the degrees of freedom, look up the right critical value, and
then take the reciprocal of this value. For example, the critical value with 0.05 on the left
with 12 numerator and 15 denominator degrees of freedom is found of taking the reciprocal
of the critical value with 0.05 on the right with 15 numerator and 12 denominator degrees of
freedom.

Avoiding Left Critical Values

Since the left critical values are a pain to calculate, they are often avoided altogether. This is
the procedure followed in the textbook. You can force the F test into a right tail test by
placing the sample with the large variance in the numerator and the smaller variance in the
denominator. It does not matter which sample has the larger sample size, only which sample
has the larger variance.

The numerator degrees of freedom will be the degrees of freedom for whichever sample has
the larger variance (since it is in the numerator) and the denominator degrees of freedom
will be the degrees of freedom for whichever sample has the smaller variance (since it is in
the denominator).

If a two-tail test is being conducted, you still have to divide alpha by 2, but you only look up
and compare the right critical value.

Assumptions / Notes

 The larger variance should always be placed in the numerator


 The test statistic is F = s1^2 / s2^2 where s1^2 > s2^2
 Divide alpha by 2 for a two tail test and then find the right critical value
 If standard deviations are given instead of variances, they must be squared
 When the degrees of freedom aren't given in the table, go with the value with the
larger critical value (this happens to be the smaller degrees of freedom). This is so
that you are less likely to reject in error (type I error)
 The populations from which the samples were obtained must be normal.
 The samples must be independent
 In statistics, regression analysis is a statistical technique for estimating the
relationships among variables. It includes many techniques for modeling and
analyzing several variables, when the focus is on the relationship between
a dependent variable and one or moreindependent variables. More specifically,
regression analysis helps one understand how the typical value of the dependent
variable changes when any one of the independent variables is varied, while the
other independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variable given the
independent variables — that is, the average value of the dependent variable when
the independent variables are fixed. Less commonly, the focus is on a quantile, or
otherlocation parameter of the conditional distribution of the dependent variable
given the independent variables. In all cases, the estimation target is a function of
the independent variables called the regression function. In regression analysis, it is
also of interest to characterize the variation of the dependent variable around the
regression function, which can be described by a probability distribution.
 Regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning. Regression analysis is also
used to understand which among the independent variables are related to the
dependent variable, and to explore the forms of these relationships. In restricted
circumstances, regression analysis can be used to infer causal relationships between
the independent and dependent variables. However this can lead to illusions or false
relationships, so caution is advisable:[1] See correlation does not imply causation.
 A large body of techniques for carrying out regression analysis has been developed.
Familiar methods such as linear regression and ordinary least squares regression
areparametric, in that the regression function is defined in terms of a finite number of
unknownparameters that are estimated from the data. Nonparametric
regression refers to techniques that allow the regression function to lie in a specified
set of functions, which may be infinite-dimensional.
 The performance of regression analysis methods in practice depends on the form of
thedata generating process, and how it relates to the regression approach being
used. Since the true form of the data-generating process is generally not known,
regression analysis often depends to some extent on making assumptions about this
process. These assumptions are sometimes testable if a large amount of data is
available. Regression models for prediction are often useful even when the
assumptions are moderately violated, although they may not perform optimally.
However, in many applications, especially with small effects or questions
of causality based on observational data, regression methods give misleading results

You might also like