Data Handling, Statistic and Errors
Data Handling, Statistic and Errors
Outline
Sample handling and management
QC and QA
Errors in analysis
Statistical analysis parameters
Descriptive statistics
Inferential statistics
Example questions
Chemical
analysis
Subject
Set of instruction
Reliability in accuracy,
reproducibility
Analytical
problem
Method
Procedures
Validate
Based on
purpose and
intended quality
Techniques
Radiochemical
Electrochemical
Thermal
Chromatography
Mass spectrometry
AAS
Gravimetry
spectrometry
HPLC
GC
FTIR
Validation method
Performance
characteristic
of detector
for single
analyte
calibration
standards
Process
repeated for
mixed analyte
calibration
standards
Process
repeated for
analyte
calibration
standard with
possible
interfering
substances
and for
reagent blank
Process
repeated for
analyte
calibration
standard with
anticipated
matrix
component to
evaluate
matrix
interference
Analysis of
spike
simulated
matrix
matrix with
added known
amount of
analyte, to
test
recoveries
Field trials in
routine lab
with more
junior
personnel to
test
ruggedness
Analysis
Representative sample
Coning and quartering solid
grab sample /composite of grab water/liquid
Random pick
2
2
1
3
4
2
1
4
4
QA - managerial
component/ responsibility of
an analytical lab with all QC
procedures are in place.
Build confidence through lab
participation by inter lab
studies.
Proficiency test to the lab
performance or analyst.
Confidence in validity
Cost effective
8
Determinate errors
Systematic error lead to bias in the measured value from analyst, equipment or
procedure which require record keeping, training or equipment maintenance.
Indeterminate error
Random error source from random fluctuations in measured quantities occurs in
closely controlled environment
Minimize by careful experimental design and
control of the environmental factors
Accumulated error
Aggregated error count in every measurement made in analytical procedures
and contributed to the final calculated results.
9
10
Indeterminate error
Accidental error or random error
Gaussian distribution
Random errors follow a Gaussian or normal distribution.
We are 95% certain that the true value falls within 2 (infinite population),
IF there is no systematic error.
11
If the Xm is based on
average of several
measurement the value
is called mean error.
12
Relative error
Example 3.6
The results of an analysis are 36.97 g, compared with
the accepted value of 37.06 g.
What is the relative error in parts per thousand, ppt?
Absolute error = 36.97 g 37.06 g = -0.09 g
Relative error = -0.09 /37.06 x 1000% = -2.4 ppt
13
Statistical analysis
Used statistical model
follow a normal (Gaussian) distribution
Average or normalize data if data set is small
to apply Gaussian distribution
14
15
This X chart requires result from known sample composition and used to evaluate accuracy.
16
Statistical parameters
software Excel, SPSS, Minitab, SYSTAT
Descriptive statistic
Check data for any problematic or non normality data set depart from bell shape or with
outliers, use frequency chart or normal plot
Means,
standard deviation,
or S (data <10),
Variance,
Skewness and kurtosis for any trend about the data indicating cluster or particular pattern
Skewness asymmetric with high frequencies on one side and a long tail of low freq on other side
Kurtotic distribution has high peak and long tail on both side
Confidence limit, CL
17
Data distribution
18
Confidence limit
Estimate the range within a given probability which the true value might fall defined by
the experimental mean and standard deviation
The range is called confidence interval and the limit is called confidence limit.
The likelihood that the true value fall within the range is called the probability or
confidence level
Select a confidence level (95% is good) for the number of samples analyzed
=(degrees of freedom +1).
Confidence limit = x ts/N.
It depends on the precision, s, and the confidence level you select.
19
Inferential statistic
Researcher need to make inferences about population
of sample
Types of inferential statistic
Significance Test, F test and T-test
Analysis of variance (ANOVA)
Significance test
Compare the result of a method with the accepted method
results to decide whether the data is significantly different from
another set of data (in the mean or availability and spread)
Used statistical table like F test or t test
F test indicate a significant different between two method based on
their standard deviation
F is defined in term of variances of two methods where the variance
is the square of the standard deviation
F = s12/s22
(Eq. 3.10)
where s12 > s22
If the calculated F value from Eq. 3.10 exceeds a tabulated F value at
the selected confidence level (e.g Table 3.2 at 95% confidence level),
then there is a significant different between variances of the two
methods
21
F value
F = s12/s22.
22
Example 3.16
You are developing a new calorific procedure for determining the
glucose content in blood serum.You have chosen the standard FolinWu procedure with which to compare your results. From the following
two sets of replicate analyses on the same sample, determine whether
the variance of your method differs significantly from that of the
standard methods using F test.
23
127
125
123
130
131
126
129
130
128
131
129
127
125
t-Test
Analysis of variance between means
Require assumption before the test
Do the sample follow a normal distribution? If small is sample then the test is
incorrect, moderate sample size of 40-100 to be accurate
the variance for the two groups is about the same. Check homogeneity of variance
assumption, can lead to inaccurate result particularly for small groups with unequal
sample sizes
observations to be assumed to be independent, such that one subject does not
influence anothers subject score.
Statistic calculate the sample means divided by a variance for comparison with the critical value
obtained from a probability table at the selected p value (0.05, 0.01 or 0.001)
if the t statistic is equal or exceed the critical value, then the difference between the two group
means is significant at the chosen level of alpha.
The test can be one-sided or twosided. The former is used when the mean for a particular
group is hypothesized to be higher than the mean for other group, the latter is used when the
mean are expected to be different.
24
Example 3.18
A new gravimetric method is developed for iron (III) in which the iron
is precipitated in crystalline form with an organoboron cage
compound. The accuracy of the method is checked by analyzing the
iron in an ore sample and comparing with the results using the
standard precipitation with ammonia and weighing of Fe2O3. The
results, reported as % Fe for each analysis, were as follows:
25
given
20.10
20.50
18.65
19.25
19.40
19.99
18.89
19.20
19.00
19.70
19.40
ANOVA
Multiple t-test when there are more than a few groups
Subject scores within groups vary due to differences in individual and random
error
ANOVA assume the observation are independent, normal and group variances
are equal
ANOVA test determine if any group mean is significant different from any other
group mean by overall F test.
If no different (i.e. F-test is not significant), then the is no point in comparing any
of the groups retain null hyphothesis.
If F-test is significant indicate at least one group mean is significantly different
from one other group mean. investigate the hypothesis for the groups.
26
Q-test
QCalc = outlier difference/range.
If QCalc > QTable, then reject the outlier as due to a systematic error.
27
Example of Q-test
Performed Q-test to find outlier data from
the following measurement and made your
conclusion to the data.
28
Sydney
Cherry
Tien
Dick
10.2
10.8
11.6
9.9
9.4
7.8
10.0
9.2
11.3
9.5
10.6
11.6
Correlation
Association between two variables that takes on a
value between +1.0 and -1.0
29
Regression
Regression consider a continuous group of variables such as age, divide the
group into the continuous nature of the age
Regression create a linear equation to predicts the score in a dependent
variable.
The equation represent a line that best fit through a scatter plot of points
describing the relationship between variable and one or more independent
variables
The beta weight or coefficient of the independent variables in the equation give
info on relationships between the independent and dependent variables
The slope of single line best fit data of the x-y axis, represent the beta weight
and reflect changes in the value of the dependent variable that associated with
each change of one unit in the independent variable.
Regression analysis assume independence, normality and constant variance, and
linear relationship between independent and dependent variables.
30
Regression
Simple linear regression a single independent variables
was used to estimate the score for dependent variable
Multiple linear regression determine amount of variance
a set of independent variables explains in the dependent
variable is significantly different from zero.
R2 value indicate the degree of variance score between the
dependent to its independent (R2 = 0.16
16 %)
31
A least-squares plot gives the best straight line through experimental points.
Excel will do this for you.
Gary Christian,
Analytical Chemistry,
6th Ed. (Wiley)
32
This Excel plot gives the same results for slope and intercept as calculated in
the example.
Gary Christian,
Analytical Chemistry,
6th Ed. (Wiley)
33
34
35
36
37
References
Gary D. Christian, 2003 Analytical Chemistry, 6th Ed., Wiley,
QD101.2 C57 2003
Daniel C Harris, Exploring Chemical Analysis Second Ed., W.H
Freeman and Company, 2000 QD 75.2. H368.
Seamus P.J. Higson, Analytical chemistry, Oxford University Press,
2004 QD 101.2.H54
38