Infrared Multivariate Quantitative Analysis: Standard Practices For
Infrared Multivariate Quantitative Analysis: Standard Practices For
Copyright © ASTM, 100 Barr Harbor Drive, West Conshohocken, PA 19428-2959, United States.
1
E 1655
of Dispersive Infrared Spectrometers7 nents than the samples which will ultimately be analyzed.
E 1421 Practice for Describing and Measuring Performance 3.2.12 surrogate method, n—a standard test method that is
of Fourier Transform Infrared (FT-IR) Spectrometers: based on a surrogate calibration.
Level Zero and Level One Tests7 3.2.13 validation samples—a set of samples used in vali-
E 1866 Guide for Establishing Spectrophotometer Perfor- dating the model. Validation samples are not part of the set of
mance Tests7 calibration samples. Reference component concentration or
E 1944 Practice for Describing and Measuring Performance property values are known (measured by reference method),
of Fourier Transform Near-Infrared (FT-NIR) Spectrom- and are compared to those estimated using the model.
eters: Level Zero and Level One Tests7
4. Summary of Practices
3. Terminology 4.1 Multivariate mathematics is applied to correlate the
3.1 Definitions—For terminology related to molecular spec- absorbances measured for a set of calibration samples to
troscopic methods, refer to Terminology E 131. For terminol- reference component concentrations or property values for the
ogy relating to quality and statistics, refer to Terminology set of samples. The resultant multivariate calibration model is
E 456. applied to the analysis of spectra of unknown samples to
3.2 Definitions of Terms Specific to This Standard: provide an estimate of the component concentration or prop-
3.2.1 analysis—in the context of this practice, the process of erty values for the unknown sample.
applying the calibration model to an absorption spectrum so as 4.2 Multilinear regression (MLR), principal components
to estimate a component concentration value or property. regression (PCR), and partial least squares (PLS) are examples
3.2.2 calibration—a process used to create a model relating of multivariate mathematical techniques that are commonly
two types of measured data. In the context of this practice, a used for the development of the calibration model. Other
process for creating a model that relates component concen- mathematical techniques are also used, but may not detect
trations or properties to absorbance spectra for a set of known outliers, and may not be validated by the procedure described
reference samples. in these practices.
3.2.3 calibration model—the mathematical expression that 4.3 Statistical tests are applied to detect outliers during the
relates component concentrations or properties to absorbances development of the calibration model. Outliers include high
for a set of reference samples. leverage samples (samples whose spectra contribute a statisti-
3.2.4 calibration samples—the set of reference samples cally significant fraction of one or more of the spectral
used for creating a calibration model. Reference component variables used in the model), and samples whose reference
concentration or property values are known (measured by values are inconsistent with the model.
reference method) for the calibration samples and correlated to 4.4 Validation of the calibration model is performed by
the absorbance spectra during the calibration. using the model to analyze a set of validation samples and
3.2.5 estimate—the value for a component concentration or statistically comparing the estimates for the validation samples
property obtained by applying the calibration model for the to reference values measured for these samples, so as to test for
analysis of an absorption spectrum. bias in the model and for agreement of the model with the
3.2.6 model validation—the process of testing a calibration reference method.
model to determine bias between the estimates from the model 4.5 Statistical tests are applied to detect when values esti-
and the reference method, and to test the expected agreement mated using the model represent extrapolation of the calibra-
between estimates made with the model and the reference tion.
method. 4.6 Statistical expressions for calculating the repeatability
3.2.7 multivariate calibration—a process for creating a of the infrared analysis and the expected agreement between
model that relates component concentrations or properties to the infrared analysis and the reference method are given.
the absorbances of a set of known reference samples at more
than one wavelength or frequency. 5. Significance and Use
3.2.8 reference method—the analytical method that is used 5.1 These practices can be used to establish the validity of
to estimate the reference component concentration or property the results obtained by an infrared (IR) spectrometer at the time
value which is used in the calibration and validation proce- the calibration is developed. The ongoing validation of esti-
dures. mates produced by analysis of unknown samples using the
3.2.9 reference values—the component concentrations or calibration model should be covered separately (see for ex-
property values for the calibration or validation samples which ample, Practice D 6122).
are measured by the reference analytical method. 5.2 These practices are intended for all users of infrared
3.2.10 spectrometer/spectrophotometer qualification, spectroscopy. Near-infrared spectroscopy is widely used for
n—the procedures by which a user demonstrates that the quantitative analysis. Many of the general principles described
performance of a specific spectrometer/spectrophotometer is in these practices relate to the common modern practices of
adequate to conduct a multivariate analysis so as to obtain near-infrared spectroscopic analysis. While sampling methods
precision consistent with that specified in the method. and instrumentation may differ, the general calibration meth-
3.2.11 surrogate calibration, n—a multivariate calibration odologies are equally applicable to mid-infrared spectroscopy.
that is developed using a calibration set which consists of New techniques are under study that may enhance those
mixtures which contain substantially fewer chemical compo- discussed within these practices. Users will find these practices
2
E 1655
to be applicable to basic aspects of the technique, to include monitored for continued accuracy and precision. Simulta-
sample selection and preparation, instrument operation, and neously, the instrument performance must be monitored so as
data interpretation. to trace any deterioration in performance to either the calibra-
5.3 The calibration procedures define the range over which tion model itself or to a failure in the instrumentation perfor-
measurements are valid and demonstrate whether or not the mance. Procedures for verifying the performance of the analy-
sensitivity and linearity of the analysis outputs are adequate for sis are only outlined in Section 22 but are covered in detail in
providing meaningful estimates of the specific physical or Practice D 6122. The use of this method requires that a model
chemical characteristics of the types of materials for which the quality control material be established at the time the model is
calibration is developed. developed. The model QC material is discussed in Section 22.
For practices to compare reference methods and analyzer
6. Overview of Multivariate Calibration methods, refer to Practices D 4855.
6.1 The practice of infrared multivariate quantitative analy- 6.1.8 Transfer of Calibrations—Transferable calibrations
sis involves the following steps: are equations that can be transferred from the original instru-
6.1.1 Selecting the Calibration Set—This set is also termed ment, where calibration data were collected, to other instru-
the training set or spectral library set. This set is to represent all ments where the calibrations are to be used to predict samples
of the chemical and physical variation normally encountered for routine analysis. In order for a calibration to be transferable
for routine analysis for the desired application. Selection of the it must perform prediction after transfer without a significant
calibration set is discussed in Section 17, after the statistical decrease in performance, as indicated by established statistical
terms necessary to define the selection criteria have been tests. In addition, statistical tests that are used to detect
defined. extrapolation of the model must be preserved during the
6.1.2 Determination of Concentrations or Properties, or transfer. Bias or slope adjustments, or both, are to be made
Both, for Calibration Samples—The chemical or physical after transfer only when statistically warranted. Calibration
properties, or both, of samples in the calibration set must be transfer, that is sometimes referred to as instrument standard-
accurately and precisely measured by the reference method in ization, is discussed in Section 21.
order to accurately calibrate the infrared model for prediction
of the unknown samples. Reference measurements are dis- 7. Infrared Instrumentation
cussed in Section 9. 7.1 A complete description of all applicable types of infra-
6.1.3 The Collection of Infrared Spectra—The collection of red instrumentation is beyond the scope of these practices.
optical data must be performed with care so as to present Only a general outline is given here.
calibration samples, validation samples, and prediction (un- 7.2 The IR instrumentation is comprised of two categories,
known) samples for analysis in an alike manner. Variation in including instruments that acquire continuous spectral data
sample presentation technique among calibration, validation, over wavelength or frequency ranges (spectrophotometers),
and prediction samples will introduce variation and error which and those that only examine one or several discrete wave-
has not been modeled within the calibration. Infrared instru- lengths or frequencies (photometers).
mentation is discussed in Section 7 and infrared spectral 7.2.1 Photometers may have one or a series of wavelength
measurements in Section 8. filters and a single detector. These filters are mounted on a
6.1.4 Calculating the Mathematical Model—The calcula- turret wheel so that the individual wavelengths are presented to
tion of mathematical (calibration) models may involve a a single detector sequentially. Continuously variable filters
variety of data treatments and calibration algorithms. The more may also be used in this fashion. These filters, either linear or
common linear techniques are discussed in Section 12. A circular, are moved past a slit to scan the wavelength being
variety of statistical techniques are used to evaluate and measured. Alternatively, photometers may have several mono-
optimize the model. These techniques are described in Section chromatic light sources, such as light-emitting diodes, that
15. Statistics used to detect outliers in the calibration set are sequentially turn on and off.
covered in Section 16. 7.3 Spectrophotometers can be classified, based upon the
6.1.5 Validation of the Calibration Model—Validation of procedure by which light is separated into component wave-
the efficacy of a specific calibration model (equation) requires lengths. Dispersive instruments generally use a diffraction
that the model be applied for the analysis of a separate set of grating to spatially disperse light into a continuum of wave-
test (validation) samples, and that the values predicted for these lengths. In scanning-grating systems, the grating is rotated so
test samples be statistically compared to values obtained by the that only a narrow band of wavelengths is transmitted to a
reference method. The statistical tests to be applied for single detector at any given time. Dispersion can occur before
validation of the model are discussed in Section 18. the sample (pre-dispersed) or after the sample (post-dispersed).
6.1.6 Application of the Model for the Analysis of 7.3.1 Spectrophotometers are also available where the
Unknowns—The mathematical model is applied to the spectra wavelength selection is accomplished without moving parts,
of unknown samples to estimate component concentrations or using a photodiode array detector. Post-dispersion is utilized. A
property values, or both, (see Section 13). Outlier statistics are grating can again provide this function, although other meth-
used to detect when the analysis involves extrapolation of the ods, such as a linear variable filter (LVF) accomplish the same
model (see Section 16). purpose (a LVF is a multilayer filter that has variable thickness
6.1.7 Routine Analysis and Monitoring—Once the efficacy along its length, such that different wavelengths are transmitted
of calibration equations is established, the equations must be at different positions). The photodiode array detector is used to
3
E 1655
acquire a continuous spectrum over wavelength without me- extraneous noise within the spectral signal. Scanning/
chanical motion. The array detector is a compact aggregate of interferometer-based systems also allow greater wavelength/
up to several thousand individual photodiode detectors. Each frequency precision between instruments due to internal
photodiode is located in a different spectral region of the wavelength/frequency standardization techniques, and the pos-
dispersed light beam and detects a unique range of wave- sibilities of computer-generated spectral corrections. For ex-
lengths. ample, scanning instruments have received approval for com-
7.3.2 The acousto-optical tunable filter is a continuous plex matrices, such as animal feed and forages (1, 2).9
variant of the fixed filter photometer with no moving optical 7.6 Descriptions of instrumentation designs related to Refs
parts for wavelength selection. A birefrigent crystal (for ex- (1) and (2) are found in Refs (3) and (4). Other instrumentation
ample, tellurium oxide) is used, in which acoustic waves at a similar in performance to that described in these references is
selected frequency are applied to select the wavelength band of acceptable for all near-infrared techniques described in these
light transmitted through the crystal. Variations in the acoustic practices.
frequency cause the crystal lattice spacing to change, that in 7.7 For information describing the measurement of perfor-
turn, causes the crystal to act as a variable transmission mance of ultraviolet, visible, and near infrared spectrophotom-
diffraction grating for one wavelength (that is, a Bragg diffrac- eters, refer to Practice E 275. For information describing the
tor). A single detector is used to analyze the signal. measurement of performance of dispersive infrared spectro-
7.3.3 An additional category of spectrophotometers uses photometers, refer to Practice E 932. For information describ-
mathematical transformations to convert modulated light sig- ing the measurement performance of Fourier Transform mid-
nals into spectral data. The most well-known example is the infrared spectrophotometers, refer to Practice E 1421. For
Fourier transform, that when applied to infrared (IR) is known information describing the measurement performance of Fou-
as FT-IR. Light is divided into two beams whose relative paths rier Transform near-infrared spectrophotometers, refer to Prac-
are varied by use of a moving optical element (for example, tice E 1944. For spectrophotometers to which these practice do
either a moving mirror, or a moving wedge of a high refractive not apply, refer to Guide E 1866.
index material). The beams are recombined to produce an
8. Infrared Spectral Measurements
interference pattern that contains all of the wavelengths of
interest. The interference pattern is mathematically converted 8.1 Multivariate calibrations are based on Beer’s Law,
into spectral data using the Fourier transform. The FT method namely, the absorbance of a homogeneous sample containing
can operate in the mid-IR and near-IR spectral regions. The FT an absorbing substance is linearly proportional to the concen-
instruments use a single detector. tration of the absorbing species. The absorbance of a sample is
defined as the logarithm to the base ten of the reciprocal of the
7.3.4 A second type of transformation spectrophotometer
transmittance, (T).
uses the Hadamard transformation. Light is initially dispersed
with a grating. Light then passes through a mask mounted on A 5 log10~1/T!
or adjacent to a single detector. The mask generates a series of The transmittance, T, is defined as the ratio of radiant power
patterns. For example, these patterns may be formed by transmitted by the sample to the radiant power incident on the
electronically opening and shutting various locations, such as sample.
in a liquid crystal display, or by moving an aperture or slit 8.1.1 For measurements conducted by reflectance, the re-
through the beam. These modulations alter the energy distri- flectance, R, is sometimes substituted for the transmittance T.
bution incident upon the detector. A mathematical transforma- The reflectance is defined as the ratio of the radiant power
tion is then used to convert the signal into spectral information. reflected by the sample to the radiant power incident on the
7.4 Infrared instruments used in multivariate calibrations sample.
should be installed and operated in accordance with the
instructions of the instrument manufacturer. Where applicable, NOTE 2—The relationship A = log10(1/R) is not a definition, but rather
an approximation designed to linearize the relationship between the
the performance of the instrument should be tested at the time measured reflectance, R, and the concentration of the absorbing species.
the calibration is conducted using procedures defined in the For some applications, other linearization functions (for example,
appropriate ASTM practice (see 2.1). The performance of the Kubelka-Munk) may be more appropriate (5).
instrument should be monitored on a periodic basis using the
8.1.2 For most types of instrumentation, the radiant power
same procedures. The monitoring procedure should detect
incident on the sample cannot be measured directly. Instead, a
changes in the performance of the instrument (relative to that
reference (background) measurement of the radiant power is
seen during collection of the calibration spectra) that would
made without the sample being present in the light beam.
affect the estimation made with the calibration model.
7.5 For most infrared quantitative applications involving NOTE 3—To avoid confusion, the reference measurement of the radiant
power will be referred to as a background measurement, and the word
complex matrices, it is a general consensus that scanning-type
reference will only be used to refer to measurements made by the
instruments (either dispersive or interferometer based) provide reference method against which the infrared is to be calibrated. (See
the greatest performance, due to the stability and reproducibil- Section 9.)
ity of modern instrumentation and to the greater amount of
spectral data provided for computer interpretation. These data
allow for greater calibration flexibility and additional options 9
The boldface numbers in parentheses refer to a list of references at the end of
for selections of spectral areas less sensitive to band shifts and the text.
4
E 1655
8.1.3 A measurement is then conducted with the sample acquisition (scan speed). A detailed description of the spectral
present, and the ratio, T, is calculated. The background acquisition parameters and their effect on multivariate calibra-
measurement may be conducted in a variety of ways depending tions is beyond the scope of these practices. However, it is
on the application and the instrumentation. The sample and its essential that all adjustable parameters that control the collec-
holder may be physically removed from the light beam and a tion and computation of spectral data be maintained constant
background measurement made on the “empty beam”. The for the collection of spectra of calibration samples, validation
sample holder (cell) may be emptied, and a background samples, and unknown samples for which estimates are to be
measurement may be taken through the “empty cell”. made.
NOTE 4—For optically thin cells, care may be necessary to avoid optical 8.5 For definitions and further description of general infra-
interferences resulting from multiple internal reflections within the cell. red quantitative measurement techniques, refer to Practice
For very thick cells, differences in the refractive index between the sample E 168. For a description of general techniques of infrared
and the empty cell may change properties of the optical system, for microanalysis, refer to Practice E 334.
example, shift focal points.
8.1.4 The sample holder (cell) may be filled with a liquid 9. Reference Method and Reference Values
that has minimal absorption in the spectral range of interest, 9.1 Infrared spectroscopy requires calibration to determine
and the background measurement may be taken through the the proportionality relationship between the signals measured
“background liquid.” Alternatively, the light beam may be split and the component concentrations or properties that are to be
or alternately passed through the sample and through an estimated. During the calibration, spectra are measured for
“empty beam,” an “empty cell,” or a “background liquid.” For samples for which these reference values are known, and the
reflectance measurements, the reflectance of a material having relationship between the sample absorbances and the reference
minimal absorbance in the region of interest is generally used values is determined. The proportionality relationship is then
as the background measurement. applied to the spectra of unknown samples to estimate the
8.1.5 The particular background referencing scheme that is concentration or property values for the sample.
used may vary among instruments, and among applications. 9.2 For simple mixtures containing only a few chemical
The same background referencing scheme must be employed components, it is generally possible to prepare mixtures that
for the measurement of all spectra of calibration samples, can serve as standards for the multivariate calibration of an
validation samples, and unknown samples to be analyzed. infrared analysis. Because of potential interferences among the
8.2 Traditionally, a sample is manually brought to the absorbances of the components, it is not sufficient to vary the
instrument and placed in a suitable optical container (a cell or concentration of only some of the mixture components, even
cuvette with windows that transmit in the region of interest). when analyses for only one component are being developed.
Alternatively, transfer pipes can continuously flow liquid Instead, all components should be varied over a range repre-
through an optical cell in the instrument for continuous sentative of that expected for future unknown samples that are
analysis. With optical fibers, the sample can be analyzed to be analyzed. Since infrared measurements are conducted on
remotely from the instrument. Light is sent to the sample a fixed volume of sample (for example, a fixed cell pathlength),
through an optical fiber or fibers and returned to the instrument it is preferable that concentration reference values be expressed
by means of another fiber or group of fibers. Instruments have in volumetric terms, for example, in volume percentage, grams
been developed that use single fibers to transmit and receive per millilitre, moles per cubic centimetre, and so forth. Devel-
the light, as well as those using bundles of fibers for this oping multivariate calibrations for reference concentrations
purpose. Detectors and light sources external to the instrument expressed in other terms (for example, weight percentage) can
can also be used, in which case only one fiber or bundle is lead to models that are linear approximations to what is really
needed. For spectral regions where transmitting fibers do not a nonlinear relationship and can lead to less accurate estimates
exist, the same function can be performed over limited dis- of the concentrations.
tances using appropriate optical transfer optics.
9.3 For complex mixtures, such as those obtained from
NOTE 5—If the instrument uses predispersion of the light, some caution petrochemical processes, preparation of reference standards is
must be exercised to avoid introducing ambient light into the system at the generally impractical, and the multivariate calibration of an
sample position, since such light may be detected, giving rise to erroneous infrared analysis must typically be performed on actual process
absorbance measurements.
samples. In this case, the reference values used to calibrate the
8.3 Although most multivariate calibrations for liquids in- infrared analysis are obtained by a reference analytical method.
volve the direct measurement of transmitted light, alternative The accuracy of a component concentration or property value
sampling technologies (for example, attenuated total reflec- estimated by a multivariate infrared analysis is highly depen-
tance) can also be employed. Transmittance measurements can dent on the accuracy and precision of the reference values used
be employed for some types of solids (for example, polymer in the calibration. The expected agreement between the infra-
films), whereas other solids (for example, powdered solids) are red estimated values and those obtained from a single reference
more commonly measured by diffuse reflectance techniques. measurement can never exceed the repeatability of the refer-
8.4 For most infrared instrumentation, a variety of adjust- ence method, since, even if the infrared estimated the true
able parameters are available to control the collection and value, the measurement of agreement is limited by the preci-
computation of the spectral data. These parameters control, for sion of the reference values. Knowledge of the precision
instance, the optical and digital resolution, and the rate of data (repeatability) of the reference method is critical in the
5
E 1655
development of an infrared multivariate calibration. The pre- 9.6 Reference methods that are not ASTM methods can be
cision of the reference data used in developing a model, and the used for the multivariate calibration of infrared analyses, but in
accuracy of the model can be improved by averaging repeated this case, it is the responsibility of the method developer to
reference measurements. establish the precision of the reference method using proce-
dures similar to those detailed in Practice E 691, in the Manual
NOTE 6—If the reference values used to calibrate a multivariate infrared
analysis are generated in a single laboratory, it is essential that the for Determining Precision for ASTM Methods on Petroleum
measurement process used to generate these values be monitored for bias Products and Lubricants10 and in Practice D 6300.
and precision using suitable quality assurance procedures (see for ex- 9.7 When multiple reference measurements are made on an
ample, Practice D 6299. If primary standards are not available to allow the individual calibration or validation sample, a Dixon’s Test (see
bias of the reference measurement process to be established, it is A1.1) should be applied to the values to determine if all of the
recommended that the laboratory participate in an interlaboratory cross- reference values came from the same population, or if one or
check program as a means of demonstrating accuracy.
NOTE 7—Samples like hydrocarbons from petrochemical process
more of the values is suspect and should be rejected.
streams can degrade with time unless careful sampling and sample storage
10. Simple Procedure to Develop a Feasibility
procedures are followed. It is critical that the composition of samples
taken for laboratory or at-line infrared analysis, or for laboratory mea- Calibration
surement of the reference data be representative of the process at the time 10.1 For new applications, it is generally not known
the samples are taken, and that composition is maintained during storage whether an adequate IR multivariate model can be developed.
and transport of the samples either to the analyzer or to the laboratory. In this case, feasibility studies can be performed to determine
Sampling should be done in accordance with methods like Practices
D 1265 and D 4057, or Practice D 4177, whichever are applicable.
if there is a relationship between the IR spectra and the
Whenever possible, sample storage for extended time periods is not component/property of interest, and whether a model of
recommended because of the likelihood of samples degrading with time in adequate precision could possibly be built. If the feasibility
spite of sampling precautions taken. Degradation of samples can cause calibration is successful, then it can be expanded and validated.
changes in the spectra measured by the analyzer and thus in the values A feasibility calibration involves the following steps:
estimated, and in the property or quality measured by the reference 10.1.1 Approximately 30 to 50 samples are collected cov-
method. ering the entire range for the constituent/property of interest.
9.4 If the reference method used to obtain reference values Care should be exercised to avoid intercorrelations among
for the multivariate calibration is an established ASTM major constituents unless such intercorrelations always exist in
method, then repeatability and reproducibility data are in- the materials being analyzed. The range in the concentration/
cluded in the method. In this case, it is only necessary to property should be preferably five times, but not less than three
demonstrate that the reference measurement is being practiced times, the standard deviation of the reproducibility
in accordance with the procedure described in the method, and (reproducibility/2.77) of the reference analysis.
that the repeatability obtained is statistically comparable to that 10.1.2 When collecting spectral data on these samples,
published in the method. Data from established quality control variations in particle size, sample presentation, and process
procedures can be used to demonstrate that the repeatability of conditions which are expected during analysis must be repro-
the reference method is within ASTM specifications. If such duced. Multiple spectra of the same sample under different
data is not available, then repeatability data should be collected conditions can be employed if such variations in conditions are
on at least three of the samples that are to be used in the anticipated during analysis.
calibration. These samples should be chosen to span the range 10.1.3 Reference analyses on these samples are conducted
of values over which the calibration is to be developed, one using the accepted reference method. If the range for the
sample having a reference value in the bottom third of the component/property is not at least five times the standard
range, one sample having a value in the middle third of the deviation of the reproducibility for the reference analysis, then
range, and one sample having a value in the upper third of the r replicate analyses should be conducted on each sample such
range. At least six reference measurements should be made on that the =r times the range is preferably five times, but at least
each sample. The standard deviation among the measurements three times, the standard deviation of the reference analysis.
should be calculated and compared to that expected based on 10.1.4 A calibration model is developed using one or more
the published repeatability.10 of the mathematical techniques described in Sections 11 and
9.5 If the reference method to be used for the multivariate 12. The calibration model is preferably tested using cross-
calibration is an established ASTM method, and the samples to validation methods such as SECV or PRESS (see 15.3.6).
be used in the calibration have been analyzed by a cooperative Other statistics can also be used to judge the overall quality of
testing program (for example, octane values obtained from the calibration.
recognized exchange groups), then the reference values ob- 10.1.5 If the SECV value obtained from the cross validation
tained by the cooperative testing program can be used directly, suggests that a model of adequate precision can be built, then
and the standard deviations established by the cooperative additional samples are collected to round out the calibration
testing program can be used as the estimate of the precision of set, and to serve as a validation set, spectra of these samples are
the reference data. collected, a final model is developed, and validated as de-
scribed in Sections 13, 14, and 15.
10
Manual on Determining Precision Data for ASTM Methods on Petroleum 11. Data Preprocessing
Products and Lubricants, Available from ASTM Headquarters. Request Research
Report RR: D02-1007. 11.1 Various types of data preprocessing algorithms can be
6
E 1655
applied to the spectral data prior to the development of a 12.1.2 The technique should be capable of providing statis-
multivariate calibration model. For example, numerical deriva- tics suitable for identifying if samples being analyzed are
tives of the spectra may be calculated using digital filtering outside the range for which the model was developed; that is,
algorithms to remove varying baselines. Such filtering gener- when the estimated values represent extrapolation of the model
ally causes a significant decrease in the spectral signal-to- (see 16.3).
noise. Digital filters may also be employed to smooth data,
NOTE 8—In the following derivations, matrices are indicated using
improving signal to noise at the expense of resolution. A boldface capital letters, vectors are indicated using boldface lowercase
complete description of all possible preprocessing methods is letters, and scalars are indicated using lowercase letters. Vectors are
beyond the scope of these practices. For the purpose of these column vectors, and their transposes are row vectors. Italicized lowercase
practices, preprocessing of the spectral data can be used if it letters indicate matrix or vector dimensions.
produces a model which has acceptable precision and which 12.1.3 All linear, multivariate techniques are designed to
passes the validation test described in Section 21. In addition, solve the same generic problem. If n calibration spectra are
any spectral preprocessing method must be automated so as to measured at f discrete wavelengths (or frequencies), then X, the
provide an exactly reproducible result, and must be applied spectral data matrix, is defined as an f by n matrix containing
consistently to all calibration spectra, validation spectra, and to the spectra (or some function of the spectra produced by
spectra of unknowns which are to be analyzed. preprocessing, as described in Section 9) as columns. Similarly
11.2 One type of preprocessing requires special mention. y is a vector of dimension n by 1 containing the reference
Mean-centering refers to a procedure in which the average of values for the calibration samples. The object of the linear,
the calibration spectra (average absorption over the calibration multivariate modeling is to calculate a prediction vector p of
spectra as a function of wavelength or frequency) is calculated dimension f by 1 that solves Eq 1:
and subtracted from the spectra of the individual calibration
samples prior to the development of the model. The average y 5 Xtp 1 e (1)
reference value among the calibration samples is also calcu- t
where X is the transpose of the matrix X obtained by
lated, and subtracted from the individual reference values for interchanging the rows and columns of X. The error vector, e,
the calibration samples. The model is then built on the is a vector of dimension n by 1, that is the difference between
mean-centered data. If the spectral and reference value data are the reference values y and their estimates, ŷ,
mean-centered prior to the development of the model, then: where:
11.2.1 When an unknown sample is analyzed, the average
ŷ 5 Xtp (2)
spectrum for the calibration site must be subtracted from the
spectrum of the unknown prior to applying the mean-centered 12.1.4 The estimation of the prediction vector p is generally
model, and the average reference value for the calibration set calculated so as to minimize the sum of squares of the errors,
must be added to the estimate from the mean-centered model to ete 5 ?? e2 ?? 5 ~y – Xtp!t~y – Xtp! (3)
obtain the final estimate; and
11.2.2 The degrees of freedom used in calculating the Since X is generally not a square matrix, it cannot be directly
standard error of calibration must be diminished by one to inverted to solve Eq 3. Instead, the pseudo or generalized
account for the degree of freedom used in calculating the inverse of X, X+, is calculated as:
average (see 15.2). X1y 5 ~XXt!21Xy 5 p (4)
12. Multivariate Calibration Mathematics where p is the least square estimate of the prediction vector p.
12.1 Multivariate mathematical techniques are used to relate It should be noted that, in applying Eq 1-4, it is assumed that
the absorbances measured for a set of calibration samples to the errors in the spectral data in X are negligible compared to
the reference values (property or component concentration the errors in the reference data, and that there is a linear
values) obtained for this set of samples from a reference test. relationship between the component concentration or property
The object is to establish a multivariate calibration model that and the spectral data. If either of these assumptions is incorrect,
can be applied to the spectra of future, unknown, samples to then the linear models derived here will not yield an optimal
estimate values (property or component concentration values). estimate of p.
Only linear multivariate techniques are described in these 12.1.5 In calculating the least square solution in Eq 4, it is
practices; that is, it is assumed that the property or component assumed that the individual error values in e (see Eq 1) are
concentration values can be modeled as a linear function of the normally distributed with common variance. This will be true
sample absorptions. Various nonlinear multivariate techniques if each of the individual reference values in y represents the
have been developed, but have generally not been as widely result of a single reference measurement, and if the repeatabil-
used as the following linear techniques. These practices are not ity of the reference method is constant over the range of values
intended to compare or contrast among these techniques. For in y. If the values in y represent averages of more than one
the purpose of these practices, the suitability of any specific reference method determination, then the least square expres-
mathematical technique should be judged only on the follow- sion in Eq 4 is not applicable. If ri reference values yi1, yi2, yi3,
ing two criteria: . . . yir are measured for calibration sample i, then a weighted
12.1.1 The technique should be capable of producing a regression can be employed. If R is a diagonal matrix of
calibration model that can be validated as described in Section dimension n by n containing the rivalues for each of the
18; and calibration samples, then the weighted regression is given by:
7
E 1655
=R ȳ 5 =RXtp 1 e (5) reference values should still be included in the y vector if they
are available. The use of the average values will lead to better
~XRXt!21XRȳ 5 p (6)
estimates of the regression coefficients, but the model produced
where =R indicates the diagonal matrix containing the square will not be the least squares minimum. Standard errors of
roots of the rivalues, and ȳ is the vector containing the averages calibration calculated by the software will generally not be
of the ri reference values for each sample. If averages of meaningful in these cases since they are not expressed relative
multiple reference values are used in y and a weighted to a single reference measurement. Standard errors of calibra-
regression is used, special care must be taken to add back the tion should be recalculated using the procedure described in
variance removed by calculating the average reference values Section 11.
(see Section 11) so that the statistics for the model can be 12.2.4 The choice of the number of wavelengths (or fre-
compared to those for a single reference value determination. quencies), k, to use in multilinear regression is a critical factor
The specific method in which the weighting is applied depends in the model development. If too few wavelengths are used, a
on the specific multivariate mathematics that are employed. less precise model will be developed. If too many wavelengths
12.1.6 For most cases, if the calibration spectra are collected are used, colinearity among the absorption values at these
over an extended wavelength (or frequency) range, the number wavelengths may lead to an unstable model. The optimum
of individual absorption values per spectrum, f, will exceed the number of wavelengths (or frequencies) for a model is related
number of calibration spectra, n. In this case, the matrices to the number of spectrally distinguishable components in the
(XXt) and (XRXt) are rank deficient and cannot be directly calibration spectra (see Section 15) and can generally only be
inverted. Even in cases where f < n, colinearity among the determined by trial and error. As a rule, the number of
calibration spectra can cause (XXt) and (XRXt) to be nearly wavelengths (or frequencies) used must be large enough to
singular (to have a determinant that is near zero), and the direct produce a model with the desired precision, but small enough
use of Eq 4 and Eq 6 can produce an unstable model, that is, to produce a stable model that passes validation.
a model for which changes on the order of the spectral noise 12.2.5 The choice of specific wavelengths (or frequencies)
level produce significant changes in the estimated values. In to include in a multilinear regression model is also a critical
order to solve Eq 4 and Eq 6, it is therefore necessary to reduce factor in the model development. Several mathematical algo-
the dimensionality of X so that a stable inverse can be rithms have been suggested for making this selection (6, 7, 8,
calculated. The various linear, mathematical techniques used 9). Alternatively, selection may be based on prior knowledge of
for multivariate calibration are different means of reducing the a relationship between the absorptions measured and the
dimensionality of X so as to be able to calculate stable inverses property or component being modeled. It is beyond the scope
of (XXt) and (XRXt) and the estimate p. of these practices to compare alternative selection methods. An
12.2 Multilinear Regression Analysis: adequate set of wavelengths (or frequencies) will, for the
12.2.1 In multilinear regression (MLR), a specific number purpose of these practices, be defined as a set that produces a
of wavelengths (or frequencies), k, are chosen such that k << n. model with the desired precision that passes the validation test
A new matrix M of dimension k by n is obtained from X by procedure described in Section 18.
extracting the columns from X that correspond to the selected 12.3 Principal Components Regression (PCR):
wavelengths (or frequencies). The calibration equation then 12.3.1 Principal components regression (PCR) is based on
becomes: the singular value decomposition of the spectral data matrix.
y 5 Mtb 1 e (7) The singular value decomposition takes the form:
where b is a vector of dimension k by 1 containing the set of X 5 L(St (10)
regression coefficients defined at each of the chosen wave- 12.3.1.1 The scores matrix, S, is a n by n matrix that
lengths (or frequencies). The solution for the regression coef- satisfies the relationship:
ficients is obtained as:
StS 5 I (11)
t 21
~MM ! My 5 b (8) t t
S ~X X!S 5 L (12)
The estimate of the full prediction vector, p, is obtained from
where I is a n by n identity matrix, and L is the matrix of
b by substituting the values from binto the corresponding eigenvalues of XtX. The n by n matrix ( is the matrix of
positions in p(corresponding to the selected wavelengths or singular values, that are the square roots of the eigenvalues,
frequencies), and setting all other elements of p (corresponding that is:
to the wavelengths or frequencies that were eliminated in going
(2 5 L (13)
from X to M) to zero.
12.2.2 If a weighted regression is used, the corresponding 12.3.1.2 The loadings matrix, L, is a f by n matrix that
form for Eq 8 becomes: satisfies the relationships:
LtL 5 I (14)
~MRMt!21MRy 5 b (9)
t t
L ~XX !L 5 L (15)
12.2.3 Not all commercial software packages that imple-
ment MLR include options for weighted regressions. If MLR 12.3.1.3 The row vectors that make up the matrices S and L
models are developed with such packages, averages of multiple are orthonormal, that is, the dot product of the vector with itself
8
E 1655
is 1, and the dot product with any other vector in the matrix is the ith calibration sample, then entering ri copies of the
0. spectrum xi into the X matrix, or weighting the spectrum xi by
NOTE 9—In some implementations of PCR, the data matrix X may be
=ri will alter the loadings that are calculated. If the spectrum
decomposed as the product of only two matrices, S and L. Either S or L xi is only measured once, the uncertainty in the spectral
is then orthogonal but not orthonormal, and either StS = L or LtL = L. variables contributed by xi is no different from that for the other
n − 1 spectra. Weighting the spectrum xi prior to the singular
12.3.1.4 Using the singular value decomposition, the value decomposition will tend to force noise characteristics of
pseudo inverse of the matrix X can be calculated as: xi into the loadings, adversely affecting the model. Weighting
X1 5 S(21Lt (16) the scores during the calculation of the regression coefficients
will properly account for the differences in the variance among
12.3.1.5 Using the pseudo inverse relationship in Eq 16, it is
the components of the ȳ vector. The weighted regression
then possible to solve for the prediction vector p. In practice,
equations become:
however, the full inverse of X as given in Eq 16 is not used,
since it contains information relating to the spectral noise in the =R ȳ 5 =RSab 1 e (23)
calibration spectra. b5 ~StaRSa!21RSta ȳ (24)
12.3.2 When a principal components analysis is conducted
on a matrix X containing the calibration spectra, the signals 12.3.4 Not all commercial software packages that imple-
arising from the absorbances of the calibration sample compo- ment PCR include options for weighted regressions. If PCR
nents generally account for the majority of the variance in X, models are developed with such packages, averages of multiple
and are concentrated into the first k loading vectors, that reference values should still be included in the y vector if they
correspond to the larger eigenvalues. While the separation of are available. The use of the average values will lead to better
signal and noise is seldom perfect, it is preferable to use only estimates of the regression coefficients, but the model produced
the first k vectors in building a model. The singular value will not be the least squares minimum. Standard errors of
decomposition of X is then written as: calibration calculated by the software will generally not be
meaningful in these cases since they are not expressed relative
X 5 La(aSta 1 Ln(nStn (17) to a single reference measurement. Standard errors of calibra-
where Sa is a n by k matrix containing the first k columns of S, tion should be recalculated using the procedure described in
La is a f by k matrix containing the first k columns of L, Sa is 15.1.
a k by k diagonal matrix containing the first k singular values, 12.3.5 As with wavelengths in multilinear regression, the
and Sn, Sn, and Ln are the corresponding matrices containing choice of the number of principal components, k, to use in the
the last n-k elements of S, L, and S. The pseudo inverse of X regression is a critical factor in the model development. If too
is then approximated as: few principal components are used, a less precise model will be
developed. If too many principal components are used, noise
X1 5 Sa(21 t
a La (18)
characteristics of the calibration samples will be incorporated
12.3.2.1 The estimate for the prediction vector, p, is then into the model leading to unstable estimations. The optimum
given as: number of principal components for a model is related to the
p 5 La(21 t number of spectrally distinguishable components in the cali-
a Say (19)
bration spectra (see Section 15), and can generally only be
12.3.2.2 Alternatively, the scores, S, may be regressed determined by trial and error. As a rule, the number of principal
against the reference values, y, to obtain a set of regression components used must be large enough to produce a model
coefficients, b: with the desired precision, but small enough to produce a stable
y 5 Sab 1 e (20) model that passes validation.
12.4 Partial Least Squares (PLS):
b5 ~StaSa!21Stay 5 Stay (21)
NOTE 10—The term PLS has been used to describe various mathemati-
12.3.2.3 Various stepwise regression algorithms (10, 11, 12)
cal algorithms. The version described here is a specific representation of
may be used to test which of the principal components (which the PLS-1 algorithm, and deals with only one set of reference values at a
columns in the scores matrix, S) show a statistically significant time. PLS-2 or multiblock PLS algorithms exist that can be used for the
correlation to the reference values in y. Coefficients (elements simultaneous calibration of multiple components or concentrations, or
of b) for principal components that do not show a statistically both, but these algorithms are less well established than PLS-1 and are not
significant correlation may be set to zero. The estimate for the included in these practices. Various descriptions of the PLS-1 algorithm
prediction vector then becomes: have been published (13, 14, 15, 16, 17, 18, 19, 20) many of which differ
slightly in the actual computational steps. In implementing the PLS-1
p 5 LaS21
a b (22) algorithm, a choice must be made as to which, if either, of the scores or
loadings vectors are to be normalized. In the following derivation, the
12.3.3 If the average of multiple reference measurements is scores vectors were normalized. If neither vector is normalized, or if the
used in the y vector, then a weighted regression should be used loadings vector is normalized instead of the scores vector, a different
in calculating the prediction vector. The weighting is prefer- expression will be obtained for the prediction vector. Differences in the
ably applied to the scores in Eq 20 and Eq 21, and the spectra derivations should not result in differences in the numerical values
in X are not weighted prior to the singular value decomposi- obtained for the prediction vector, nor in estimates based on it.
tion. 12.4.1 Like PCR, PLS involves the decomposition of the
12.3.3.1 If ri individual reference values are measured for spectral data matrix, X, into the product of matrices. Unlike
9
E 1655
PCR where X is first decomposed, and then regressed versus values are measured, then weighting both X and y by =R in
the reference values, in PLS, the y vector is used in obtaining Step 1 of the PLS algorithm will over emphasize the spectral
the decomposition of X. The PLS proceeds by means of a variables contributed by xi. Preferably, weighting is done only
series of steps, which are repeated in a loop. Each time the in the calculation of the regression coefficients in Step 3. Eq
steps are repeated, a weighting vector wi (of dimension f by 1), 31and Eq 32 then become:
a scores vector si (of dimension n by 1), a regression coefficient
=R ȳ 5 =Rŝibi 1 e (40)
bi (a scalar), and a loadings vector li (of dimension f by 1) are
calculated. The subscript i indicates the number of times the bi 5 ~ŝi Rŝi!21ŝitR
t
ȳ (41)
entire loop has been executed, and is initially 1. 12.4.2.1 The other steps in the algorithm proceed un-
12.4.1.1 Step 1—Calculation of a weighting vector of di- changed.
mension f by 1, wi: 12.4.3 Not all commercial software packages that imple-
Xt 5 ywti 1 Z (25)
ment PLS include options for weighted regressions. If PLS
models are developed with such packages, averages of multiple
ŵi 5 Xy (26)
reference values should still be included in the ȳ vector if they
12.4.1.2 Step 2—Scaling the weight vector ŵi and calcula- are available. The use of the average values will lead to better
tion of a normalized scores vector, si, of dimension n by 1: estimates of the regression coefficients, but the model produced
will not be the least squares minimum. Standard errors of
Xt 5 si ŵti 1 Z (27)
calibration calculated by the software will generally not be
t
ŝ 5 X ŵi (28) meaningful in these cases since they are not expressed relative
ŵi 5 ŵi/~ ŝ ŝ! t
(29) to a single reference measurement. Standard errors of calibra-
t
tion should be recalculated using the procedure described in
ŝi 5 ŝ/~ ŝ ŝ! (30)
15.2.
12.4.1.3 Step 3—Regressing the scores vector against the
13. Estimation of Values from Spectra
reference values to obtain a regression coefficient, bi:
13.1 If x (an f by 1 vector) is the spectrum of a sample, then
y 5 ŝibi 1 e (31)
ŷ (a scalar), the estimated component concentration or property
bi 5 ŝti y (32) value, is given by:
12.4.1.4 Step 4—Calculation of a loading vector, li of ŷ 5 xt p (42)
dimension f by 1:
where p is the prediction vector obtained from the multivariate
X 5 li ŝit 1 Z (33) calibration. The expression in Eq 42 involves only the dot
li 5 Xŝi (34) product of two vectors to obtain the estimated value; it has the
advantage of being computationally simple. However, alterna-
12.4.1.5 Step 5—Calculation of the residuals: tive computations are often employed in obtaining ŷ, since they
Zi 5 X 2 liŝit (35) provide additional parameters required to calculate the uncer-
tainty in the estimation as well as whether or not the estimation
ei 5 y 2 biŝi (36) is being made by interpolation or extrapolation of the calibra-
12.4.1.6 For subsequent times through the loop, the matrix tion model.
X is replaced with the residuals matrix Zi–1 from the previous 13.2 Estimations by MLR—For MLR, the absorbance val-
loop, and the y vector is replaced with the residuals vector ei–1. ues in x that correspond to the wavelengths (or frequencies)
The loop is repeated k times to obtain k weighting, scores, and chosen in the calibration are extracted to form a vector m (of
loading vectors, and k regression coefficients. The overall dimension k by 1). The estimate ŷ is then obtained as the dot
expression for the results is then: product of the vector m with the vector of regression coeffi-
X 5 LSt 1 Z (37) cients, b:
y 5 Sb 1 e (38) ŷ 5 mt b (43)
where S is the n by k matrix containing the ŝi as rows, L is the 13.3 Estimations by PCR:
f by k matrix containing the li as individual rows, Z is the 13.3.1 For PCR, the vector x is first decomposed:
residual from the spectral data matrix, and e is the residual xt 5 st(Lt (44)
from the estimation of the reference values. The estimate of the t t 21
prediction vector is then given by: ŝ 5 x L( (45)
10
E 1655
model. After the first cycle through Eq 47and Eq 48, xt and x calibration is sometimes referred to as the standard error of estimate
are replaced with zi-1t and z-i-1 from the previous cycle. (SEE).
ŝit 5 xtwi (47) NOTE 11—If a constant term is included in a MLR regression, or in the
regression of PCR scores against concentrations or properties, then
zi 5 x 2 li ŝit (48) d = n − k − l, since one degree of freedom is associated with the constant.
Care must be exercised in using a constant. In the case where neat samples
13.4.2 The estimated scores, ŝ (a k by 1 vector), are then are analyzed and the samples are run in fixed pathlength cells, the volume
multiplied by the regression coefficients obtained from the fractions of all components are constrained to sum to unity. Inclusion of
calibration to obtain ŷ a constant under these conditions can result in near singular matrices, and
unstable models.
ŷ 5 ŝt b (49)
NOTE 12—For surrogate calibrations, there is no a priori relationship
between the SEC calculated based on the simple gravimetric mixtures and
14. Post Processing the error level expected for application of the model for analysis of actual
14.1 Several multivariate methods involve some post pro- samples. It is recommended that such standard errors be subscripted as
cessing of the estimates from the multivariate model. The most SECsurrogate.
common example is for mean-centered models (see Section 15.2.3 The standard error of calibration is used in estimating
11), where the average reference value for the calibration set the expected agreement between values estimated using the
must be added to the initial estimate from the model to obtain calibration models and values that would be measured by the
the final estimate. A model can be developed to estimate reference method (see Section 9). Some care must be applied in
changes in the pathlength of the cell used to contain the sample interpreting SEC if the values used in y are not single
for analysis, and the estimated concentrations or property determinations by the reference method. If the values in ȳ for
values can be scaled based on the results of the pathlength individual samples represent the average of multiple reference
estimate. measurements, then the SEC calculated from Eq 51 is not on a
14.2 A complete description of possible post-processing per reference measurement basis. For example, if all values in
algorithms is beyond the scope of these practices. Post- ȳ are the average of three reference measurements, then the
processing can be employed if it provides a model with SEC calculated using Eq 51 can only be used to estimate the
adequate precision, passes the validation test described in expected agreement between the infrared estimate and the
Section 18, and provided that the post-processing algorithm is average of three reference measurements.
automated, so as to provide exactly reproducible results, and is 15.2.3.1 If multiple reference values are used for some or all
applied uniformly to the results from calibration, validation, of the calibration samples, it is possible to calculate an SEC
and analyses. value that is on a per reference measurement basis. If xi is the
spectrum of the ith calibration sample, and yi1, yi2. . . yir are ri
15. Statistics Used in Evaluating and Optimizing independently measured reference values for that sample, then
Calibration Models the weighted regression Eq 9 for MLR, 23 and 24 for PCR, and
15.1 Various statistics are used to evaluate and optimize the 40 and 41 for PLS are preferably used in calculating the
performance of multivariate calibration models. These statis- prediction vectors. Whether or not a weighted regression is
tics are generally applied only to data in the calibration set; employed, the variance removed by calculating the averages
they should not be confused with the statistics that are used to must be calculated as:
validate the model (Section 18), that are calculated based on a n ri
11
E 1655
The SEC calculated in this fashion will be on a per reference and cannot contribute a variable to the multivariate model.
measurement basis. Clearly, for complex mixtures, the number of detectable,
15.2.3.4 An alternative expression for SEC in the case spectrally distinguishable components (or functionalities) is
where multiple reference values per sample are used is given often less than the number of real chemical components.
by: 15.3.3 Estimating the maximum number of detectable, spec-
Œ
n ri
trally distinguishable components among a set of calibration
( ( ~yij 2 ŷi!2
i51 j51
spectra requires knowledge of the spectral noise level. The
SEC 5 (56) spectral noise level can be estimated from replicate measure-
dw
ments conducted on a single sample. For instance, if replicate
NOTE 13—In Eq 53, the e vector represents the difference between the spectra are conducted on one sample, a PCR analysis of the
estimated value and the reference value, where the reference value may be spectra can be conducted. Since the spectra all represent the
the average of more than one reference measurement. The matrix notation same material, only one principal component should be present
implies the sum of the weighted squares of the differences, where the
square of the difference is weighted by the number of reference values that
in the spectral data. The percentage of the variance due to the
were included in the average. Alternatively, the square of the difference first principal component (the first eigenvalue divided by the
between the estimated value and each individual reference value can be sum of all the eigenvalues) can be calculated. This percentage
computed and summed as in Eq 56, in which case the variance term is zero of the variance can be used to estimate a cutoff point for
since the average reference values are not used in the calculation. determining how many principal components to include in a
15.2.4 The standard error of calibration (SEC) is the stan- model, namely, the sum of the first k eigenvalues divided by the
dard deviation for the differences between reference and IR sum of all the eigenvalues should be of the same order as the
estimated values for samples within the calibration set. It is an cutoff. Similar calculations can be performed using PLS. For
indication of the total residual error due to the particular MLR, tests for colinearity among the absorbances at candidate
regression equation to which it applies. The SEC will generally wavelengths are generally conducted as part of the wavelength
decrease when the number of independent variables used in the selection procedure. For instance, if a model is built using k
model increases, indicating that increasing the number of terms wavelengths for which the absorbances are linearly indepen-
will allow more variation in the data to be explained, or dent, the linear dependence of all candidate wavelengths for
“fitted”. The SEC statistic is a useful estimate of the theoretical inclusion in a model based on k + 1 wavelengths can be
“best” accuracy obtainable for a specified set of variables used checked. If the absorbances at all candidate wavelengths can be
to develop a calibration model. fit as a linear combination of the k wavelengths already
15.3 Optimizing the Number of Variables in a Model: selected to within the spectral noise level, then k is the
maximum number of linearly independent wavelengths upon
15.3.1 Determining how many variables (wavelengths in
which a model can be based.
MLR, principal components, or PLS latent variables) to use in
a model is a critical step in the model development. Unfortu- 15.3.4 Models can be built using fewer than k variables,
nately, there are no hard and fast rules upon which to make this provided that such models exhibit adequate precision and pass
determination. In general, if too few variables are used, a less validation.
precise model will result. If too many variables are used, the 15.3.5 Knowledge of the precision of the reference method
estimates from the model may be unstable, that is, small is also useful in determining how many variables to include in
changes in the spectrum on the order of the spectral noise may a multivariate model. As discussed, the agreement between
produce statistically significant changes in the estimates. infrared estimated values and reference values can never
15.3.2 The maximum number of variables that should be exceed the repeatability of the reference method, since, even if
used in developing a multivariate calibration model, k, is the infrared estimated the true value, the measure of the
related to the number of detectable, spectrally distinguishable agreement would be limited by the repeatability of the refer-
components (or functionalities) that are present in the calibra- ence method. Comparison of the standard error of calibration
tion set. Components (or functionalities) are spectrally distin- (calculated on the basis of a single reference measurement)
guishable if they give rise to absorptions which are not linearly against the standard deviation calculated from the reference
correlated among the calibration samples, and if the change in method repeatability provides an indication of the maximum
the absorptions among the calibration spectra is larger than the number of variables to include in a model. Standard errors of
spectral noise. If, within a calibration set, the concentrations of calibration that are lower than the standard deviation for the
components are linearly correlated, then the absorptions due to reference method indicate overfitting of the data.
these components will also be linearly correlated. Even if these 15.3.6 Cross validation procedures are also used to estimate
components have isolated absorption features, they will not be the optimum number of variables that should be included in a
spectrally distinguishable to the multivariate mathematics, and model. In cross validation, one or more sample spectra are
will contribute at most one variable to the multivariate model. removed from the data matrix, their corresponding reference
If the concentrations of the components are nearly correlated, values are removed from the reference value vector, and a
such that the absorptions due to the components are colinear to model is built on the remaining samples. The model is then
within the spectral noise, then the components are not spec- used to estimate the value for the samples that were left out.
trally distinguishable. If components are present at sufficiently This process is repeated until each sample has been left out
low levels so that the component absorption is below the once. The error from the cross validation, ecv, is then calculated
spectral noise, then the component is not spectrally detectable as
12
E 1655
ecv 5 ŷcv 2 y (57) 15.4 Confidence Limits for an Estimated Value:
where ŷcv is the vector containing the cross validation esti- 15.4.1 The confidence limits for a value estimated by a
mates. A PRESS value can then be calculated as: multivariate model is given by:
n t · SEC · =1 1 h (60)
PRESS 5 etcvecv 5 ( ~ ŷcv 2 yi! 2
i51 i
(58)
where t is the student’s t value for the number of degrees of
15.3.6.1 A standard error of cross validation (SECV) is freedom in the model, and h is the leverage statistic defined in
calculated as: 16.2. If t values are chosen from Table A1.3 for the 95 %
Œ
probability level, then for a validated model, a single value
PRESS
SECV 5 (59) measured by the reference method is expected to fall within a
n
range from of ŷ − t · SEC · =1 1 h to ŷ + t · SEC · f =1 1 h
15.3.6.2 PRESS or SECV values can be calculated as a for 95 % of samples analyzed, provided that the analysis is an
function of the number of variables used in the model. The interpolation of the model. The confidence limits for an
procedure would normally start by using one variable as a estimated value in Eq 60 are sometimes referred to as the
model while leaving a single sample out of the calibration set. confidence bands or confidence intervals for the estimate.
After a calibration is developed using the remaining samples, 15.4.2 The use of Eq 60 to estimate the confidence limits is
the algorithm predicts the excluded sample and records the only an approximation since it ignores any uncertainty in x, the
difference between the reference and estimated values. This spectral data. The confidence limits in Eq 60 derive from the
procedure is iterated (repeated) for the entire sample set, and assumption that the errors in x are negligible compared to the
the PRESS (or SECV) value for the one variable model is errors in y, and that the spectrum x can be completely
reported. The procedure then adds another variable and repeats described by the variables used in the model. If the errors in the
the process. The PRESS procedure will stop when the predes- spectral data are not negligible, or if the spectrum x contains
ignated number of factors is reached (say 10 to 20 maximum). absorptions due to components that were not present in the
The calibration model with the smallest PRESS (SECV) can be calibration set, the confidence limits in Eq 60 underestimate the
selected as the optimum model for the calibration set used. If potential error in the estimate. Eq 60 is expected to give a
more than one model have similar PRESS values, the one with reasonable approximation for the confidence limits on an
fewer variables will generally be chosen. estimated value for samples that are interpolations of the model
15.3.6.3 A plot of PRESS (SECV) values (y-axis) versus the (see 16.4).
number of variables (x-axis) is often used to determine the 15.5 Additional Statistics for Evaluating the Mathematical
minimum PRESS corresponding with the optimum number of Models:
variables in the calibration model. A minimum in the function 15.5.1 A variety of statistical tests are in use for evaluating
can be taken as an indication of the maximum number of calibration models. Some tests that are in common use include:
variables to be used. If no minimum occurs, the first point at
15.5.1.1 Coefficient of multiple determination,
which the PRESS or SECV reaches a more or less constant
15.5.1.2 Correlation coefficient,
level can provide an indication of the maximum number of
variables to include. Comparisons of SECV against the stan- 15.5.1.3 F-test statistic (F for regression),
dard deviation for the reference method repeatability are again 15.5.1.4 Partial F or t2 test for a regression coefficient,
useful, SECVs significantly lower than the standard deviation 15.5.1.5 Standard error of calibration (standard error of
suggesting overfitting of the data. estimate), and
15.3.6.4 An excellent description of the cross validation 15.5.1.6 Bias corrected standard error.
procedure (algorithm) is found in page 325 of Ref (21). 15.5.2 Although many of these tests have been more com-
Calculation of PRESS and SECV can be computationally monly applied to MLR models, some are equally applicable to
intensive and can result in the use of substantial computer time. PCR and PLS models. Details on these tests and related
statistical terms are included in Annex A2. Further explana-
NOTE 14—The exact values of PRESS and SECV calculated will tions for these statistical tests can be found in Annex A2 and
depend on how many samples are left out during each cycle of the cross several references (22, 23).
validation. If more than one sample is left out during a cycle, then the
PRESS and SECV will depend on the combination of samples left out.
Cross validation routines that leave out multiple spectra during each cycle 16. Outlier Statistics
require less computation time than routines that leave out one spectrum at 16.1 During calibration, outlier statistics are applied to
a time. However, the results of such routines are less comparable and identify samples that have unusually high leverage on the
reproducible than those which leave out one spectrum at a time.
multivariate regression. During analysis, outlier statistics are
15.3.7 The above-mentioned methods for estimating the employed to detect samples which represent an extrapolation
number of variables to use in a model are intended only as of the model.
guidelines. None of the methods can be relied upon to always 16.2 Leverage Statistic—The leverage statistic, h, is a scalar
produce a stable model. The ultimate test for the number of measure of where the spectral vector x lies within the multi-
variables is whether or not the model can be validated as variate parameter space used in the model. The leverage
described below. The number of variables used in a model must statistic is used in detecting outliers during the calibration, in
ultimately be chosen to produce a model with the desired detecting extrapolation of the model during analyses, and in
precision that can be validated. estimating the uncertainty on an estimated value.
13
E 1655
NOTE 15—Commercial software packages use numerous variations on 16.3.2 High-leverage samples are identified based on the
the leverage statistic. The leverage statistic is sometimes referred to as the leverage statistic, h. For all types of linear calibrations de-
hat matrix (24), or as the Mahalanobis Distance, D2 (although it is actually scribed above, the average leverage statistic for all of the
the square of the distance). Various commercial software packages may
use D instead of D2. Some software packages may scale h (or D2) by n (or
calibration sample spectra has a value of k/n where k is the
n − 1 if mean-centered), to obtain a statistic that is independent of the number of variables in the regression (the number of wave-
number of calibration samples. If this scaled statistic is further multiplied lengths in MLR, the number of principal components, or the
by (n−k−1)/nk, a statistic that has an F distribution is obtained (25). The number of PLS latent variables), and n is the number of
leverage statistic, h, is preferred here since it is easily related to the calibration samples. On average, each sample contributes k/n
number of samples and variables. Model developers should attempt to of the spectral variables. For samples that have h > 3k/n, the
verify exactly what is being calculated. Both mean-centered and not mean
sample spectrum is contributing a significant fraction to the
centered definitions for h exist, with the mean-centered approach pre-
ferred. Regardless of whether mean centering of data is performed, the definition of one of the spectral variables and to the regression
statistic designated h has valid utility for outlier detection. coefficient associated with this variable. Samples with h > 3k/n
16.2.1 If x is a spectral vector (dimension f by 1) and X is should be eliminated from the calibration set in the develop-
the matrix of calibration spectra (of dimension n by f), then the ment of the model.
leverage statistic is defined as: NOTE 17—If the leverage statistic is scaled as described in (25), an f test
t t 1 can be employed for outlier detection.
h 5 x ~XX ! x (61)
16.2.2 For a mean-centered calibration, x and X in Eq 61 are 16.3.3 If calibration spectra with h >3k/n are eliminated
replaced by x − x̄ and X − X̄ respectively. from the calibration set, and the model is rebuilt, it is not
16.2.3 If a weighted regression is used, the expression for uncommon for additional spectra with h >3k/n to be identified
the leverage statistic becomes: for the new model. This occurrence is most likely if removal of
samples reduces k, but can also be caused merely by scaling
h 5 xt ~XRXt!1x (62) changes to the multivariate space induced by changes in n.
16.2.4 In MLR, if m is the vector (dimension k by 1) of the When repetitive application of the 3k/n rule continues to
selected absorbance values obtained from a spectral vector x, identify outliers, the outlier test is said to “snowball.” If
and M is the matrix of selected absorbance values for the “snowballing” occurs, it may indicate some problem with the
calibration samples, then the leverage statistic is defined as: structure of the spectral data set. The variable space of the
h 5 mt ~MMt!21m (63)
model should be examined for unusual distributions or clus-
terings.
16.2.5 Similarly, if a weighted regression is used, the 16.3.3.1 If the following sequence occurs during the devel-
expression for the leverage statistic becomes: opment of a model, the 3k/n outlier test can be relaxed: (1) a
h 5 mt ~MRMt!21m (64) first model is built on an initial calibration set, (2) calibration
16.2.6 In PCR and PLS, the leverage statistic for a sample spectra with h >3k/n are eliminated from the calibration set, (3)
with spectrum x is obtained by substituting the decompositions a second model using the same number, k, variables is built on
for PCR, or for PLS, into Eq 61. The statistic is expressed as: the subset of calibration spectra, and (4) calibration spectra
with h >3k/n are identified for the second model. The second
h 5 st s (65) model should be used providing that no calibration samples
NOTE 16—If the scores from the PCR or PLS model are not normalized, have h greater than 0.5.
then the form of Eq 65 becomes D2= st (StS)−1s 16.3.3.2 If (1) a first model is built on an initial calibration
16.2.7 If a weighted PCR or PLS regression is used, the set, (2) calibration spectra with h >3k/n are eliminated from the
expression for the leverage statistic becomes calibration set, and (3) a second model using fewer variables is
built on the subset of calibration spectra, the 3k/n outlier test
h 5 st ~StRS!21s (66)
should not automatically be relaxed. Instead, the first model
16.3 Outlier Detection During Calibration: should be rebuilt using the lower number of variables and the
16.3.1 Two types of outliers can be identified during the sequence in 16.3.3.1 should be applied to the new model.
calibration procedures. The first type of outlier is a sample that 16.3.4 A second type of outlier is one for which the
represents an extreme composition relative to the remainder of estimated value ŷ differs by a statistically significant amount
the calibration set. These samples have very high leverage on from the value from the reference method, y. Such outliers can
the regression results; that is, they are largely responsible for be detected based on studentized residuals. If ei is the differ-
the determination of at least one of the regression coefficient ence between the estimated value ŷi and the reference value yi
values. Generally, there is insufficient data in the calibration set for the ith sample in the calibration set, and hi is the leverage
to statistically determine the accuracy of reference values statistic for that sample, the studentized residuals for the ith
associated with these high leverage samples. Their inclusion in sample are given by:
the calibration may lead to erroneous estimations of similar
ei
samples if the reference value for the high leverage sample is ti 5 (67)
SEC =1 2 h
in error. The second type of outlier is one for which the
estimated value differs from the reference value by a statisti- 16.3.4.1 The studentized residuals should be normally dis-
cally significant amount. Such outliers indicate either an error tributed with common variance. The studentized residuals
in the reference measurement or a failure of the model. value can be compared to a t distribution value for n − k (or
14
E 1655
n − k − 1 if mean centered) degrees of freedom, to determine comparing an estimate of the unknown spectrum derived from
the probability that the error in the estimate fits the expected the model to the measured spectrum of the unknown.
distribution. If not, the sample should be considered an outlier. 16.4.4.1 For PCR, an estimate of the spectrum of the
A more detailed discussion of studentized residuals can be unknown can be calculated as:
found in Refs 26–27. xt 5 ŝt(Lt (68)
16.3.5 If a sample is identified as an outlier based on
studentized residuals or other similar tests, then the reference where the ŝ is the vector of scores. Similarly for PLS:
value may be in error. When possible, the reference test should xt 5 ŝtLt (69)
be repeated to determine a correct value for the sample
where the ŝ is the vector of scores. The difference between the
(multiple tests are recommended). If the reference value is not
estimated spectrum and the actual spectrum can be calculated
in error, then the large studentized residuals may indicate a
as:
basic failure in the model. For estimation of component
concentrations, there may be sufficient spectral interferences to r5 x2x (70)
preclude accurate estimation of the component for this class of 16.4.4.2 The root mean square spectral residuals (RMSSR)
samples. For property estimation, some component that has a for the spectrum can then be calculated as:
Œ
significant effect on the property may not be detected. Remov-
rtr
ing outliers of this type without evidence of error in the RMSSR 5 (71)
f
reference value should be avoided whenever possible, since
these samples may provide the only indication that the model NOTE 19—Some commercial software packages may calculate other
is not applicable to a certain class of materials. statistics related to RMSSR, or may call RMSSR by some other name. The
16.4 Interpolation and Extrapolation of the Model During model developer should verify what statistics are used in the software to
indicate how well the model fits a spectrum being analyzed. The RMSSR
Analysis: is intended as an example of how such a calculation can be done. Other
16.4.1 The spectra of the calibration samples define a set of similar statistics can be used.
variables that are used in the calibration of the multivariate
model. If, when unknown samples are analyzed, the variables 16.4.5 The RMSSR values can be calculated for each of the
calculated from the spectrum of the unknown sample lie within calibration samples. One of the calibration samples will exhibit
the range of the variables for the calibration, the estimated a maximum RMSSR, RMSSRmax. Assuming that outliers have
value for the unknown sample is obtained by interpolation of been removed prior to the development of the calibration
the model. If the variables for the unknown sample are outside model, RMSSRmax can be used to calculate a cutoff above
the range of the variables in the calibration model, the estimate which RMSSR values for unknown spectra are to be taken as
represents an extrapolation of the model. evidence of extrapolation of the model.
16.4.2 Two types of extrapolation are possible. First, the 16.4.6 In general, the RMSSRmax cannot be used directly to
sample may contain the same components as the calibration set the cutoff for indicating extrapolation. For PCR and PLS
samples, but at concentration ranges that are outside the ranges models, some of the spectral noise characteristics of the
in the calibration set. Second, the sample may contain compo- calibration spectra are always incorporated into the spectral
nents that were not present in the calibration samples. variables. The RMSSR values calculated for spectra used in the
16.4.3 The leverage statistic, h, provides a useful indication calibration will thus generally be lower than corresponding
of the first type of extrapolation. For the calibration set, one values calculated for spectra of the same samples which are not
sample will have a maximum leverage statistic, hmax. This is used in the model development. For estimating a suitable cutoff
the most extreme sample in the calibration set, in that, it is the RMSSR value to serve as an indication of extrapolation, the
farthest from the center of the space defined by the spectral following procedure is recommended.
variables. If the leverage statistic for an unknown sample is 16.4.6.1 Replicate spectral measurements (at least seven) of
greater than hmax, then the estimate for the sample clearly several (at least three) of the calibration samples should be
represents an extrapolation of the model. Providing that outli- made. The replicate measurements should include all steps in
ers have been eliminated during the calibration, the distribution the measurement procedure (for example, background spec-
of h should be representative of the calibration model, and hmax trum collection, loading of the sample, and measurement of the
can be used as an indication of extrapolation. spectrum).
16.4.6.2 One spectrum from the set is to be used in the
NOTE 18—Comparison of the spectral variables for an unknown against development of the calibration model. The RMSSR values for
the range of each spectral variable in the calibration model could be done, the spectra used in the calibration are calculated. The RMSS-
and extrapolation of any single variable could be taken as extrapolation of
the model. The use of the leverage statistic as an indicator of extrapolation
Rcal (i) is the value for the spectrum of Sample i.
may not detect certain spectra which are slight extrapolations on one or 16.4.6.3 The remaining replicate spectra are analyzed using
more spectral variables; however, significant extrapolation of any one the calibration model, and RMSSR values are calculated and
variable will result in a high leverage statistic, and thus detection of averaged for each sample. The RMSSRanal (i) is the average
extrapolation. Use of individual variables in tests for extrapolation is not RMSSR for the replicate spectrum of Sample i.
recommended since it can unduly restrict the range of samples to which 16.4.6.4 The ratios of the RMSSR values from the analyses
the model is applicable.
to those from the calibration are calculated and averaged, and
16.4.4 The second type of extrapolation of the model, RMSSRmax is multiplied by the average ratio to obtain the
namely, the presence of a new component, can be detected by cutoff:
15
E 1655
RMSSRlimit 5 ( F RMSSRanal~i!
RMSSRcal~i! G
RMSSRmax (72)
sample spectra. A maximum NND value is determined. This
value represents the largest distance between calibration
16.4.6.5 If the RMSSR value for an unknown sample being sample spectra.
analyzed exceeds RMSSRlimit, then the analysis of the sample 16.4.8.5 During analysis, the NND value is calculated for
represents an extrapolation of the model. the unknown sample spectrum relative to the calibration
16.4.7 Statistics comparable to RMSSR cannot be calcu- spectra. If the calculated value is greater than the maximum
lated for multiple linear regression. The MLR is thus incapable NND from 16.5.3, then the minimum distance between the
of detecting the second type of extrapolation, namely, the process sample spectrum and the calibration spectra is greater
presence of a new component that was not in the calibration than the largest distance between calibration sample spectra,
samples. Care should be exercised when applying MLR in the unknown sample spectrum falls within a sparsely populated
systems where the calibration set used in the development of region of the calibration space. Such samples are referred to as
the MLR model may not represent the total range of sample Nearest Neighbor Outliers.
compositions that will be encountered during analyses. In such 17. Selection of Calibration Samples
cases, MLR should be supplemented with other techniques to
determine if the sample being analyzed falls within the scope 17.1 For the development of a multivariate model, an ideal
of the calibration. For example, outlier statistics from PCR calibration sample set will:
models developed on the same calibration set could be used for 17.1.1 Contain samples which provide examples of all
this purpose. chemical components which are expected to be present in the
samples which are to be analyzed using the model, thereby
NOTE 20—For PLS models, residuals calculations such as RMSSR are ensuring that analyses involve interpolation of the model;
not always a useful indicator of outliers. If, during calibration, a 17.1.2 Contain samples for which the range of variation in
significant percentage of the spectral(X-block) variance due to signal is not the concentrations of the chemical components exceeds the
used in the model, then the model residuals used to calculate RMSSRcal
may contain significant contributions due to calibration sample component
range of variation expected for samples which are to be
absorptions. In such cases, RMSSRlimit values calculated on the basis of analyzed using the model, thereby ensuring that analyses
such RMSSRcal values may be too large to detect model extrapolation due involve interpolation of the model;
to new chemical components in samples being analyzed. 17.1.3 Contain samples for which the concentrations of
The procedure described in 15.3.3 can be used to estimate the chemical components are uniformly distributed over their total
percentage of the total X-block variance that is due to signal. If the range of variation;
variance included in the model is significantly less than the signal 17.1.4 Contain a sufficient number of samples to statistically
variance, then the modeler may wish to supplement the PLS model with
define the relationships between the spectral variables and the
a PCR model built on the same data. RMSSR statistics from the PCR
model are then used for outlier detection. The number of variables used in component concentrations or properties to be modeled.
the PCR model should be sufficient to account for the signal variance. 17.2 For simple mixtures, calibration samples can generally
be prepared to meet the criteria above. For complex mixtures,
16.4.8 Nearest Neighbor Distance—If the calibration obtaining an ideal calibration set is difficult, if not impossible.
sample spectra form multiple clusters within the variable The statistical tests that are used to detect outliers guard against
space, the spectrum of the unknown being analyzed can have a non-ideal calibration sets. The RMSSR values detect when
D2 less than D2max yet fall into a relatively unpopulated portion samples being analyzed contain components that are not
of the calibration space. In this case, the sample being analyzed represented in the calibration set (violation of criterion 1
contains the same components as the calibration samples (since above). Leverage statistics detect when samples being ana-
the sample is not a RMSSR outlier), but at combinations that lyzed are outside the concentration ranges represented in the
are not represented in the calibration set. The spectrum of the calibration set (violation of criterion 2). Outlier detection
unknown does not belong to any of the calibration sample during model development identifies components for which the
spectra clusters, and the results produced by application of the range of concentrations in the calibration set is not uniform
model may be invalid. Under these circumstances, it is (violation of criterion 3).
desirable to employ a Nearest Neighbor Distance test to detect 17.3 The number of samples that are required to calibrate an
unknown samples that fall within voids in the calibration infrared multivariate model (see 17.1.4) depends on the com-
space. plexity of the samples being analyzed. If the samples to be
16.4.8.1 Nearest Neighbor Distance, NND, measures the analyzed contain only a few components that vary in concen-
distance between the spectrum being analyzed, x, and indi- tration, then there will be a small number of spectral variables,
vidual spectra in the calibration set, xi. and a relatively small calibration set is adequate to define the
NND 5 min@~x 2 xi!t ~XXt!21 ~x 2 xi!# (73) relationship between the variables and the concentrations or
properties. If a larger number of components vary in the
16.4.8.2 For MLR, NND is calculated as
samples to be analyzed, then a larger number of calibration
NND 5 min@~m 2 mi!t ~MMt!21~m 2 mi!# (74) samples is required for the model development. Determining
16.4.8.3 For PCR and PLS (with orthogonal scores), NND whether or not a set of calibration samples is adequate can only
is calculated as be done after a model is developed and an estimate of the
number of spectral variables required for the model is made.
NND 5 min@~s 2 si!t ~s 2 si!# (75) 17.4 If a multivariate model is developed using three or
16.4.8.4 NND values are calculated for all the calibration fewer variables, then the calibration set should contain a
16
E 1655
minimum of 24 samples after elimination of outliers. for which the model was developed; that is, the span and the
17.5 If a multivariate model is developed using k (>3) standard deviation of the range of concentrations or property
variables, then the calibration set should contain a minimum of values for the validation samples should be at least 95 % of the
6k spectra after elimination of outliers. If the model is mean span and the standard deviation of the range of concentrations
centered, a minimum of 6(k + 1) spectra should remain. or property values in the model, and the concentration or
property values for the validation samples should be distributed
NOTE 21—6k is chosen to ensure at least 20 df in the model for
statistical testing, and to ensure that there is an adequate number of as uniformly as possible across the range; and
samples to define the relationship between the spectral variables and the 18.2.3.2 Span the range of spectral variables for which the
concentration or property values. model was developed; that is, if the range of a spectral variable
17.6 For some spectroscopic analyses, it is possible to in the calibration model is from a to b, and the standard
calibrate using gravimetrically or volumetrically prepared deviation of the spectral variable is c, then the spectral
mixtures which contain significantly fewer components than variables estimated for the validation samples should cover at
the samples which will ultimately be analyzed. For these least 95 % of the range from a to b, and should be distributed
surrogate methods, the outlier statistics described herein are as uniformly as possible across the range such that the standard
not strictly appropriate since all actual samples are by defini- deviation in the spectral variables estimated for the validation
tion outliers relative to the simplified calibrations. Thus, samples will be at least 95 % of c.
surrogate methods cannot strictly fulfill the requirements of 18.2.4 Determination of whether a validation set is adequate
this practice. Surrogate methods should, however, follow the will generally require that the set be analyzed so that the
requirements described herein for the number and range of spectral variables for the set can be determined. Samples
calibration samples. whose analyses are extrapolations of the model should not be
included in the validation set. If the validation set does not
18. Validation of a Multivariate Model
meet the criteria in 18.2.3.1 and 18.2.3.2, additional validation
18.1 Validation of an infrared multivariate model is accom- samples should be taken.
plished by applying the model for the analysis of a set of v 18.3 Validation Spectra Measurement and Analysis—
validation samples, and statistically comparing the estimates Spectra of validation samples should be collected using exactly
for these samples to known reference values. Validation the same procedures as were used to collect spectra of the
requires thorough testing of the model to ensure that it calibration samples. Reference values for the validation
performs up to the expectations derived from the calibration set samples should be obtained using the same reference method
statistics. as was used for the calibration samples. Spectra should be
18.2 Validation Sample Set: analyzed using the multivariate model to produce estimates of
18.2.1 For the validation of a multivariate model, an ideal the component concentrations or properties, and the statistics
validation sample set will: described in Sections 18 and 19 should be calculated.
18.2.1.1 Contain samples that provide examples of all 18.4 Validation Error:
chemical components which are expected to be present in the 18.4.1 If v (a vector of dimensions v by one) are the
samples which are to be analyzed using the model; estimates obtained by analysis of the spectra of the v validation
18.2.1.2 Contain samples for which the range of variation in samples, and v are the corresponding values measured by the
the concentrations of the chemical components is comparable reference method, then the validation error, e is given by:
to the range of variation expected for samples that are to be
analyzed using the model: e5v2 v (76)
18.2.1.3 Contain samples for which the concentrations of 18.4.2 If multiple reference values are available for some of
chemical components are uniformly distributed over their total the validation samples, then the average of the individual
range of variation; and reference measurements can be used in v, and the variance
18.2.1.4 Contain a sufficient number of samples to statisti- removed by calculating the averages should be calculated using
cally test the relationships between the spectral variables and Eq 52.
the component concentrations or properties that were modeled. 18.5 Variance of the Validation Error—The variance of the
18.2.2 For simple mixtures, validation samples can gener- error of the validation measurements is calculated as:
ally be prepared to meet the criteria in 18.2.1.1-18.2.1.4. For v ri
complex mixtures, obtaining an ideal validation set is difficult VARv 5 etRe 1 s2avg 5 ( (
i51 j51
~vij 2 vi! 2 (77)
if not impossible.
18.2.3 The number of samples needed to validate an infra- where s2avg is zero and R is an identity matrix if individual
red multivariate model depends on the complexity of the reference measurements are used in v.
model. Only samples whose analyses are found to be interpo- 18.6 Standard Error of Validation:
lations of the model should be used in the validation procedure. 18.6.1 The standard error of validation (SEV) is given by:
If five or fewer spectral variables are used in the model, then a
!
v ri
Œ ( ( ~vij 2 vi! 2
minimum of 20 interpolation samples is recommended. If k >
VARv i51 j51
5 spectral variables are used in the model, then a minimum of SEV 5 dv 5 v (78)
4k interpolation samples should be used in the validation. In
i51
( ri
addition, the validation samples should:
18.2.3.1 Span the range of concentrations or property values dv is the total number of reference values available for all v
17
E 1655
validation samples. SEV is the standard deviation in the SEC 3 =1 1 D2 to ŷ + t 3 SEC 3 =1 1 D2 . If more
differences between reference and IR estimated values for than 5 % of the reference values fall outside this range, then the
samples in the validation set. The standard error of validation confidence limit estimates based on SEC are questionable, and
is sometimes referred to as a standard error of prediction. A further testing is required to demonstrate the agreement be-
bias corrected version of this statistic has also been defined as tween the model and the reference method.
the standard error of performance. To avoid confusion between
18.10.2 An alternative method can be used to demonstrate
two terms that are both abbreviated SEP, the use of SEV is
agreement between the model and the reference method. This
preferred in these practices.
alternative method is preferred when the precision of the
18.6.2 Studentized residuals testing can be applied to the reference method is not constant across the range of reference
estimates of the validation set to detect possible errors in the values used in the calibration, but can be applied even when the
reference values. precision is constant. If R(yi) is the reproducibility of the
18.7 Validation Bias—The average bias for the estimation reference method at level yi, then the percentage of reference
of the validation set, ēv, is calculated as: values for which:
v v ri
ŷi 2 R ~ ŷi! , yij , ŷi 1 R ~ ŷi!
(
j51
riei
( ( ~v
i51 j51
ij 2 vi! (82)
ēv 5 dv 5 v (79) is calculated. If 95 % or more of the reference values fall
(r
i51
i
within this interval, then estimates produced with the multi-
variate IR model agree with those produced by the reference
where ri is 1 if individual reference values were used, or is the method as well as a second laboratory repeating the reference
number of reference values that were averaged for the ith measurement would agree.
validation sample if averages are used. dv is the total number of
reference values used in the calculation. 18.11 For multivariate analyses employing surrogate cali-
brations, a procedure similar to that described here for valida-
18.8 Standard Deviation of Validation Errors—The stan-
tion is often performed for the purpose of verifying that the
dard deviation of the validation errors, SDV, is calculated as
Œ
instrument is properly calibrated. This instrument qualification
!
v v ri
procedure typically involves the analysis of gravimetrically or
( r ~e 2 ē ! 1 s2avg
( ( @~v
2
i i v ij 2 vi! 2 ēv# 2
i51 i51 j51 volumetrically prepared mixtures that contain significantly
SDV 5 dv 2 1 5 v fewer components than the samples which will ultimately be
~ ( ri! 2 1
i51 analyzed. There is no a priori relationship between the standard
(80) error that is calculated from this procedure and the error
where ri is 1 and s avg is 0 if individual reference measure-
2 expected during application of the model to actual samples. To
ments are used in calculating ŷ. avoid confusion, it is recommended that the procedure be
referred to as a spectrometer/spectrophotometer qualification,
18.9 Significance of Validation Bias:
not validation. Additionally, it is recommended that the stan-
18.9.1 A t test is used to determine if the validation dard error calculated from this procedure be referred to as a
estimates show a statistically significant bias. A t value is
Standard Error of Qualification (SEQsurrogate), not as a Standard
calculated as:
Error of Validation.
| ēv| =dv
t5 SDV (81)
19. Precision of Infrared Estimated Values
The t value is compared to critical t values from Table A1.3 for 19.1 The precision of values estimated from an infrared
dv degrees of freedom. multivariate model is calculated from repeated spectral mea-
18.9.2 If the t value is less than the critical t value, then surements. The number of samples for which repeat measure-
analyses based on the multivariate model are expected to give ments is made should be at least equal to the number of
essentially the same average result as measurements conducted variables used in the model, and never less than three. The
by the reference method, provided that the analysis represents samples used for repeat spectral measurements should span at
an interpolation of the model. least 95 % of the range of concentration or property values
18.9.3 If the t value calculated is greater than the tabulated used in the model. When possible, samples should be selected
t value, there is a 95 % probability that the estimate from the to ensure that some variation on each spectral variable is
multivariate model will not give the same average results as the exhibited among the samples. At least six spectra should be
reference method. Validity of the multivariate model is then collected for each sample. The spectra should be analyzed and
suspect. Further investigation of the model is required to values estimated. The average estimate for each sample should
resolve the probable bias that is indicated. be calculated, and the standard deviation among the estimates
18.10 Validation of Agreement Between Model and Refer- should be obtained. If yij is the estimate for the jth spectrum of
ence Method: ri total spectra for the ith sample, then the average estimate for
18.10.1 The confidence limits on the estimates for the this sample is:
validation samples should be calculated, and a determination ri
ŷi 5 (83)
for the validation samples lie within the range from ŷ − t3 ri
18
E 1655
19.1.1 The standard deviation of the replicate estimates is 20.2 Sampling Related Errors—Table 2 lists errors arising
calculated as: from sampling problems and possible solutions to these prob-
Œ
ri lems (28).
( ~ ŷ
j51
ij 2 ȳi! 2
20.3 Sources of Calibration Error—Table 3 lists sources of
si 5 ri 2 1 (84) error in the development of the calibration model and possible
ways to minimize these errors.
19.2 A x2 value is calculated using the standard deviation 20.4 Analysis Errors—Table 4 lists factors that can contrib-
values calculated in Eq 81: ute to errors in the estimated values for unknown samples and
2.3026 ri possible ways to minimize such errors.
x2 5 c ~r log s 2 i (
2
ri log s2i ! (85)
51
21. Wavelength (Frequency) Sensitivity of a Multivariate
where: Model
t
21.1 Wavelength stability of spectrometers is often a critical
r5 ( ri
i51
(86)
factor in the performance of a multivariate calibration. The
s5 Œ 1 t
r i(
51
ris2i (87)
estimation of the sensitivity of a multivariate model to changes
in the wavelength scale provides a useful parameter against
S( D
z
which instrument performance can be judged. The wavelength
1 1 1 sensitivity of a model can be roughly estimated by the
c511 2r (88)
3~z 2 1! i 5 1 ri
following procedure:
and z is the number of samples for which replicate measure- 21.1.1 Identify the samples in the calibration set that repre-
ments were made. sent the extreme values of each of the spectral variables;
19.3 The x2 value calculated in Eq 85 is compared with a 21.1.2 If the spectra are collected with a digital resolution of
critical value from a chi-squared table (see Table A1.4) for t − D, then shift each spectrum by + D and by − D.
1 degrees of freedom. If the calculated x2 value is less than the 21.1.3 Analyze the shifted spectra, using the calibration
critical value, then all of the variances for the replicated model, and calculate the change in the estimates between the
measurements belong to the same population, and the average +D and −D spectra, and
variance calculated in Eq 87 can be used as a measure of the 21.1.4 Identify the spectrum showing the largest change
repeatability of the infrared measurement. The infrared analy- upon shifting. If the estimates are ŷ+D and ŷ−D respectively,
sis is expected to have a repeatability on the order of t 3 =2 then the wavelength (frequency) sensitivity of the model can be
s̄. estimated as:
19.4 If the calculated x2 value is greater than the critical 0.1 3 D 3 SEC/~ ŷ1D 2 ŷ2D! (89)
chi-squared value, then the repeatability of the infrared esti- 21.2 The value calculated in Eq 89 is the wavelength shift
mate may vary with sample composition. In this case, the that, in the worst case (the most sensitive spectrum) will
infrared analysis is expected to have a repeatability that is no produce a change in the estimate that is on the order of 5 % of
worse than t 3 =2 3 smax, where smax is the maximum si the standard error of calibration.
value for the replicate measurements.
NOTE 22—The wavelength sensitivity of a model calculated in Section
20. Major Sources of Calibration and Analysis Error
TABLE 2 Sampling Related Errors
20.1 General Sources of Error in Spectral Measurements—
Sampling Error Possible Solution
Table 1 list some possible sources of error that can occur
during the spectral measurement and potential solutions for Nonhomogeneity of Improve mixing guidelines or grinding procedures,
sample or both
these problems. For solids, average replicate repacks
For solids, rotate sample cups
Measure multiple aliquots from large sample
TABLE 1 General Sources of Error in Spectral Measurements volume
Physical variation in solid Improved sample mixing during sample
Source of Spectral Error Possible Solution
samples preparation
Poor instrument performance Conduct instrument performance tests Diffuse light before it strikes the sample using a
regularly to monitor changes in instrument light diffusing plate
performance Pulverize sample to particle size of less than 40
Analyze QC (Quality Check) sample to µm (NIR) or 2 µm (MIR)
determine if instrument performance Average multiple repacks of each sample
changes affect analysis Rotate sample or average five sample
Absorbance exceeds linear Determine linear response range for measurements
response range instrument Chemical variation in Freeze-dry sample for storage and measurement
Choose pathlengths to keep bands of sample with time Immediate data collection and analysis following
interest in range sample preparation
Optical polarization effects Use depolarizing elements Identification of kinetics of chemical change and
Variable sample presentation Improve sample presentation methods avoidance of rapidly changing spectral regions
Investigate commercially available sample Bubbles in liquid Check pressure requirements for single-phase
presentation equipment samples sample
Optical component Inspect windows, etc., for contamination Check flow properties of cell for sample
contamination and clean as necessary introduction
19
E 1655
TABLE 3 Sources of Calibration Error developed. Instrument standardization can also involve actual
Source of Calibration Error Possible Solution adjustment of the instrument hardware to achieve such agree-
Spectroscopy insensitive to Try alternative spectral region ment. Instrument standardization is one means of achieving
component/property being Redefine requirement in terms of calibration transfer.
modeled measurable components/properties
Inadequate sampling of Review criteria for calibration set selection
22.3 Calibration transfer or instrument standardization may
population in calibration set Use sample selection techniques for be required when maintenance is done to an instrument if such
selecting calibration set (29) maintenance produces a change in the spectral response large
Outlier samples within Employ outlier detection algorithms
calibration set Eliminate spectral outliers or find additional
enough to change the values estimated by the calibration
examples model. The calibration can be thought of as being transferred
Eliminate reference data outliers or from one instrument (before maintenance) to a second instru-
remeasure
Reference data errors Analyze blind replicates to test precision ment (after maintenance).
Correct procedural errors, improve analytical 22.4 When a calibration transfer or instrument standardiza-
procedures tion procedure is developed, it is necessary to demonstrate that
Check and recalibrate reagents, equipment,
etc. (30) the performance of the model is not degraded during the
Non-Beer’s Law relationship Develop multiple calibrations over smaller transfer. To demonstrate that a calibration transfer or instru-
(Nonlinearity due to concentration ranges ment standardization procedure preserves the performance of a
component interactions)
(Nonlinearity due to instrument Check dynamic range of instrument, Try model, it is necessary to validate the model as described in
response) shorter pathlengths Section 18. Each calibration transfer or instrument standard-
Sensitivity to baseline shifts, Preprocessing of data to minimize effects of ization procedure must be tested at least once by performing a
etc. baseline
Transcription errors Two people cross-check or one person full validation of the transferred model. Once the success of a
triple- check all handscribed data particular calibration transfer or instrument standardization
procedure has been demonstrated for a particular type of
instrument, then quality control samples can be used to
TABLE 4 Analysis Errors evaluate additional transfers and standardizations.
Sources of Analysis Error Possible Solution
Poor calibration model Validate calibration model on representative
23. Calibration Quality Control
validation set 23.1 When an IR, multivariate, analysis is used to estimate
Poor instrument performance Check performance of instrument/model with
QC samples
component concentrations or properties, or both, it is desirable
Diagnose instrument problems with instrument to periodically test the analysis (instrument and model) to
performance tests ensure that the performance of the analysis is unchanged. To
Poor calibration transfer Validate calibration transfer and instrument
standardization procedures
perform such tests, it is sometimes necessary to choose one or
Select calibrations with lowest noise, wave more quality control samples that will be used for this purpose.
length shift sensitivity, and offset sensitivity A complete discussion of methods used to validate the perfor-
Sample outside model range Employ outlier statistics to test that sample is
interpolation of model
mance of an IR analyzer is beyond the scope of these practices.
The user is referred to Practice D 6122 which discusses
validation of IR analyzers for hydrocarbon analysis, and to
12 will depend on a variety of factors, including the optical and digital Refs 30 and 31 which discuss methods that have gained
resolution of the instrument relative to the bandwidths of the sample being acceptance within the agricultural community.
measured. Calculation of a wavelength sensitivity is done to provide a 23.2 Control samples (materials for which reference values
useful diagnostic for analyses conducted on the same type of analyzer. The have been measured using the reference method) can be
wavelength stability of the analyzer can be compared to the value in Eq 83
employed to monitor the performance of the analysis, provided
as a means of monitoring the performance of the analyzer. Because the
value in Eq 83 is dependent on specific instrumental parameters, it should that the analyses of the control samples involve interpolation of
generally not be used to compare the suitability of analyzers for a the model. The IR estimated values for the control samples are
particular application. compared to the reference values using established ASTM
procedures or alternative statistical tests (30, 32). These tests
22. Calibration Transfer and Instrument Standardization will generally require that the IR estimated values and the
22.1 Calibration transfer refers to a process by which a reference values agree to within the confidence intervals
calibration model is developed using data from one spectrom- defined in 15.3. Since the confidence limits are based on SEC,
eter, is possibly modified, and is applied for the analysis of and since SEC is often dominated by the error in the reference
spectra collected on a second spectrometer. The calibration measurement, these procedures may not provide the most
transfer may require that spectral data for a common sample or sensitive indication of changes in the performance of the
samples be collected on both instruments, and that some analysis. Alternatively, quality control (QC) samples can be
transfer function be developed and applied to the spectra or the employed.
model. A complete description of calibration transfer method- 23.3 Quality control (QC) samples are used to monitor
ologies is beyond the scope of these practices. changes in the performance of an analysis (instrument and
22.2 Instrument standardization is a process where the model), after the analysis has been validated. Quality control
spectra collected on a second instrument are mathematically materials should be identified at the time the model is devel-
adjusted in an attempt to match the spectra that would have oped based on the following criteria:
been collected on the instrument on which the calibration was 23.3.1 QC materials must be chemically and physically
20
E 1655
compatible with materials being analyzed, so as to not intro- 23.6 The use of bias and slope adjustments to improve
duce contaminants into the samples being analyzed, and not to calibration or prediction statistics for IR multivariate models is
cause safety problems. generally not recommended. Prediction errors requiring con-
23.3.2 QC materials must be chemically stable when stored tinued bias and slope corrections indicate drift in reference
and sampled. If mixtures are used, the composition of the method or changes in the instrument photometric or wave-
mixture must be known and methods for reproducing the length stability. If a calibration model fails during the QC
mixture must be established. monitoring step, the performance of the instrument should be
23.3.3 The spectra of the QC material must be compatible evaluated using the appropriate ASTM instrument performance
with the model. Absorption bands for the QC material should test, and any instrument problem that is identified should be
not exceed the linear response range of the instrument in corrected. If control samples are used, checks should be
regions used in the calibration model. The spectra of the QC performed on the reference method to ensure that reference
material should be as similar as possible to spectra of the values are correct. If instrument maintenance is performed,
calibration samples. However, analysis of the QC sample can calibration transfer or instrument standardization procedures,
be an extrapolation of the model. or both, should be followed to reestablish the calibration.
23.4 Spectral data on the QC material is collected during the
same time period that spectra of the calibration and validation 24. Model Updating
samples are collected. The QC material should be treated in 24.1 It may sometimes be desirable to add additional
exactly the same fashion as other samples so that variations in calibration samples to an existing model to increase the range
the spectra are representative of the variations which will occur of applicability of the model. The new calibration samples may
during the collection of spectra for unknowns. Separate contain the same components as the original calibration
samples should be used for each measurement. A minimum of samples but at more extreme concentrations, or new compo-
20 spectra should be collected. nents not present in the original calibration samples. The new
calibration samples may fill voids in the original calibration
NOTE 23—If the QC spectra are collected over too short a time interval,
the variation seen in the spectra will be smaller than that typically
space.
encountered in application of the model to unknowns, and QC limits set 24.1.1 When a model is updated, the matrix X containing
based on these spectra will be excessively tight. the original calibration spectra is augmented with the spectra of
the additional calibration samples, and the vector y containing
23.4.1 The spectra for the QC material are analyzed using
the property or composition values for the calibration samples
the calibration model, and the average value, ȳqc is calculated:
is augmented with the values for the additional calibration
q
samples.
( ŷ
i51
i
Œ
q
rejected as outliers.
( ~ ŷi 2
i51
ȳqc!2 24.2 When a calibration model is updated, it must be
sqc 5 (91) revalidated. The requirements for validation samples for an
q21
updated model are the same as for the original model (see
23.4.1.1 Dixon’s test can be applied to the individual Section 18). The spectra used to validate the original model can
estimated values to identify outliers in the calculations in Eq 90 be used to validate the updated model, but they must be
and Eq 91. supplemented to cover an adequate range as described in 18.2.
23.5 The QC material is analyzed periodically when the The percentage of new samples added to the validation set for
analysis (instrument and model) is in use for analyzing the updated model must be at least as large as the percentage of
unknowns. The QC material is treated exactly the same as an new samples added to the calibration set.
unknown sample being estimated. The estimated value for the
QC material is compared to yqc. The estimated value is 25. Multivariate Calibration Questionnaire
expected to be within the range from yqc − t 3 sqc to yqc + t 3 25.1 The following questionnaire is designed to assist the
sqc 95 % of the time, where t is the studentized t value for q − 1 user in determining if a multivariate calibration conforms to the
df and the 95 % confidence level. requirements set forth in these practices.
23.5.1 If the analysis of the QC material is an interpolation 25.1.1 If all of the following questions in 25.1.3-25.1.7 are
of the model, then sqc should be consistent with the repeat- answered in the affirmative, then the calibration can be said to
ability of the IR analysis as defined in Section 19. If the have been developed and validated according to E 1655.
analysis of the QC material is an extrapolation of the model, 25.1.2 If any of the following questions in 25.1.3-25.1.7 are
then sqc may be somewhat higher than the si calculated in answered in the negative, then the calibration can not be said to
Section 19. However, since the control limits are still based on have been developed and validated according to E 1655. If the
the repeatability of the spectral measurement and do not calibration method is MLR, PCR or PLS-1, the calibration may
depend on the reference method, they are expected generally to be said to have been developed using mathematical techniques
be tighter than those derived from control samples. described in E 1655. ASTM methods that reference E 1655
21
E 1655
should not claim calibration or validation via E 1655 unless all 25.1.5.3 Was the number of validation samples greater than
of the following questions would have been answered in the 4k if the model was not mean centered, or greater than 4(k + 1)
affirmative for the procedures followed during the collection of if the model was mean centered? (18.2.3)
round robin data on which the method is based. 25.1.5.4 Was the number of validation samples at least 20?
25.1.3 The following questions apply to the mathematical (18.2.3)
methodology used in the calibration:
25.1.5.5 Did the validation samples span 95 % of the range
25.1.3.1 Was the mathematical technique used in the cali-
bration MLR, PCR or PLS-1? (Sections 12 and 13) of the calibration samples? (18.2.3.1)
25.1.3.2 Did the calibration methodology include the capa- 25.1.5.6 If SEC is the Standard Error of Calibration, do
bility of detecting high leverage outliers using a statistic such 95 % of the results for the validation samples fall within
as the leverage statistic, h? (16.2) 6 t·SEC· =1 1 h of the reference values where t is the
25.1.3.3 Did the analysis methodology include the capabil- Studentized t value for n−k degrees of freedom (n−k−1 for
ity to detect outliers via a statistic such as those based on mean centered models), and h is the leverage statistic?
spectral residuals? (16.4.4-16.4.7) (18.10.1)
25.1.4 The following questions apply to the calibration 25.1.5.7 Do the validation results show a statistically insig-
model where n is the number of samples in the calibration set, nificant bias? (18.9.1)
and k is the number of variables (MLR wavelengths, Principal
25.1.6 Was the precision of the model determined using t$
Components, or PLS latent variables) in the model.
k $ 3 test samples and r $ 6 replicate measurements per
25.1.4.1 Was n>6k if the model is not mean centered, or n >
6(k + 1) if the model is mean centered? (17.5) sample? (Section 19)
25.1.4.2 Was the number of samples in the calibration set at 25.1.7 If the calibration and analysis methodology includes
least 24? (17.4) preprocessing or postprocessing, are these calculations per-
25.1.5 The following questions apply to the validation of formed automatically? (Sections 11 and 14)
the model:
25.1.5.1 Was a separate set of validation samples used to 26. Keywords
test the calibration? (18.2)
26.1 infrared analysis; molecular spectroscopy; multivariate
25.1.5.2 Were validation spectra which were outliers based
analysis; quantitative analysis
on either leverage (Mahalanobis Distance) or spectral residuals
excluded from the validation set? (18.2.3)
22
E 1655
ANNEXES
(Mandatory Information)
A1.1 Dixon’s Test Functions for Rejection of Outliers TABLE A1.1 Critical Values for Rejection of a Discordant
Measurement (31)
A1.1.1 This test provides a simple and highly efficient
Statistic N a = 0.05 a = 0.01
method for determining whether all data obtained came from
the same population (with unknown mean and standard devia- r10 3 0.941 0.988
4 0.765 0.889
tion) and if one or more of the data points are suspect and 5 0.642 0.780
should be rejected. 6 0.560 0.698
7 0.507 0.637
A1.1.2 In applying this test the number of determinations r11 8 0.554 0.683
(N) are tabulated in increasing order of magnitude and desig- 9 0.512 0.635
nated as X1, X2, X3, . . . Xn. 10 0.477 0.597
r21 11 0.576 0.679
A1.1.3 The values at the extremes of the tabulation X1 and 12 0.546 0.642
Xn are tested in turn in accordance with the number of values 13 0.521 0.615
in the tabulation. r22 14 0.546 0.641
15 0.525 0.616
16 0.507 0.595
A1.2 Select the proper expression shown as follows in 17 0.490 0.577
accordance with the number (N) of the values in the tabulation 18 0.475 0.561
and the upper or lower limit to be tested: 19 0.462 0.547
20 0.450 0.535
Outliers Under 21 0.440 0.524
X1 Xn
Test 22 0.430 0.514
For N = 23 0.421 0.505
3 to 7 ~ X 2 2 X1 ! ~Xn 2 X~n 2 1!!
r 5 r 5 24 0.413 0.497
~ X n 2 X1 ! ~ X n 2 X1 ! 25 0.406 0.489
23
E 1655
TABLE A1.2 F-Distribution: Degrees of Freedom for Numerator
1 2 3 4 5 6 7 8 9 10 12 15 20
1 161 200 216 225 230 234 237 239 241 242 244 246 248
2 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 19.4 19.4 19.4 19.4
3 10.1 9.55 9.28 9.12 9.01 8.94 8.87 8.85 8.81 8.79 8.74 8.70 8.66
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80
5 6.61 5.79 5.41 5.19 5.06 4.95 4.88 4.81 4.77 4.74 4.68 4.62 4.56
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 3.94 3.87
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.57 3.51 3.44
8 5.32 4.46 4.07 3.54 3.69 3.58 3.50 3.44 3.39 3.35 3.28 3.22 3.15
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 3.01 2.94
10 4.96 4.10 3.70 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.91 2.85 2.77
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.79 2.72 2.55
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.69 2.62 2.54
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.60 2.53 2.46
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.53 2.46 2.39
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.48 2.40 2.33
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.42 2.35 2.28
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.38 2.31 2.23
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.34 2.27 2.19
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.31 2.23 2.16
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.28 2.20 2.12
` 3.84 3.00 2.50 2.37 2.21 2.10 2.01 1.94 1.88 1.83 1.75 1.67 1.57
24
E 1655
FIG. A1.1 Nomograph for Number of Determinations to Obtain Desired Confidence Limits
A2. STATISTICAL TESTS COMMON TO NIRS METHODS (18, 19) (SUPPLEMENTAL INFORMATION)
A2.1 Common Symbols represent dimensions of vectors and matrices. Italicized sub-
A2.1.1 Throughout these practices, lowercase letters are scripts are sample, wavelength indices. For example:
used to represent scalar quantities. Lower case bold letters are
used to represent vectors, and upper case BOLD letters are yi = Scalar reference value for the ith sample.
used to represent matrices. Italicized letters are used to
25
E 1655
ŷi = The estimated y-value for ith sample based on a between the actual values for the data points and the predicted
regression model. or estimated values for these points are explained by the
ȳ = The mean y value for all samples. calibration equation (mathematical model), and 50 % is not
y = Vector of reference values for n samples. explained. Squared values approaching 1.0 are attempted when
xi = Spectral vector of length f for the ith sample. developing calibrations. R-squared can be estimated using a
X = Matrix of spectra, the n rows of X contain the spectra simple method as outlined as follows.
of length f for n samples. A2.3.2.1 The R2 is determined using the equation:
n = Number of samples used in a calibration model. n
f = Number of frequencies or wavelengths used in a ( ~ yi 2 ŷi!/~n 2 k 2 1!
calibration model. i51 SSreg
R2 5 1 2 n 5 SS (A2.6)
k = Number of variables used in a calibration model. tot
( = Capital sigma represents summation of all values A2.3.2.2 If sR is the standard deviation of the errors in the
within parentheses. reference method measurement, and sY is the standard devia-
2
R = Coefficient of multiple determination (R-squared). tion in the reference values used in the calibration (a measure
R = The simple correlation coefficient for a linear regres-
of the range spanned by the reference data), then R2 values that
sion for any set of data points; this is equal to the
exceed 1 − sR2/sY2 are probable indications of overfitting of
square root of the R-squared value.
the data.
b0 = The bias or y-intercept value for any calibration
function fit to x, y data. For bias-corrected standard A2.3.3 F-Test Statistic for the Regression:
error calculations the bias is equal to the difference A2.3.3.1 This statistic is also termed F for regression, or
between the average reference analytical values and t-squared. F increases as the equation begins to model, or fit,
the IR predicted values. more of the variation within the data. With R-squared held
constant, the F value increases as the number of samples
A2.2 Statistical Terms increases. As the number wavelengths used within the regres-
A2.2.1 Sum of squares for regression: sion equation decreases, F tends to increase. Deleting an
n unimportant wavelength from an equation will cause the F for
SSreg 5 ( ~ ŷi 2
i51
ȳ! 2 (A2.1) regression to increase.
A2.3.3.2 The F-statistic can also be useful in recognizing
A2.2.2 Sum of squares for residual:
suspected outliers within a calibration sample set; if the
n
F-value decreases when a sample is deleted, the sample was
SSres 5 ( ~ ŷi 2 yi! 2
i51
(A2.2)
not an outlier. This situation is the result of the sample not
A2.2.3 Mean square for regression: affecting the overall fit of the calibration line to the data while
n
at the same time decreasing the number of sample (n).
( ~ ŷi 2
i51
ȳ! 2 Conversely, if deleting a single sample increases the overall F
MSreg 5 (A2.3) for regression, the sample is considered a suspected outlier. F
k21
is defined as the mean square for regression divided by the
A2.2.4 Mean square for residual: mean square for residual (see statistical terms in A1.2).
n A2.3.3.3 The F for the regression is determined by the
( ~ ŷi 2 yi! 2
i51 equation:
MSreg 5 n2k21 (A2.4)
R 2~n 2 k 2 1! MSreg
A2.2.5 Total sum of squares: F5 5 MS (A2.7)
~1 2 R 2!k res
n
SStot 5 ( ~ yi 2 ȳ! 2 (A2.5) A2.3.4 Student’s t-Value (For a Regression):
i51
A2.3.4.1 This statistic is equivalent to the F statistic in the
A2.3 Test Statistics determination of the correlation between X and y data. It can
A2.3.1 The statistics discussed as follows have most com- be used to determine whether there is a true correlation
monly been applied to MLR models. The statistics assume that between an IR estimated value and the primary chemical
the data has been mean centered in developing the model. analysis for that sample. It is used to test the hypothesis that the
Similar statistics can be derived for PCR and PLS models, and correlation really exists and has not happened only by chance.
for models that are not mean centered. A large t value (generally greater than ten) indicates a real
A2.3.2 Coeffıcient of Multiple Determination (statistically significant) correlation between X and y.
The coefficient of multiple determination is also termed the A2.3.4.2 The t for regression is calculated as:
R-squared statistic, or total explained variation. This statistic R=n 2 k 2 1
allows determination of the amount of variation in the data that t5 (A2.8)
is adequately modeled by the calibration equation as a total
=1 2 R 2
fraction of 1.0. Thus R2 = 1.00 indicates the calibration equa- A2.3.5 Partial F or t-Squared Test for a Regression
tion models 100 % of the variation within the data. An Coeffıcient:
R2 = 0.50 indicates that 50 % of the variation in the differences A2.3.5.1 This test indicates whether the addition of a
26
E 1655
particular wavelength (independent variable) and its corre- of all regression coefficients. The larger the value, the greater
sponding regression coefficient (multiplier) adds any signifi- is the sensitivity to particle size differences between samples or
cant improvement to an equation’s ability to model the data to the isotropic (mirror-like) scattering properties of samples.
(including the remaining unexplained variation). Small F or t The offset sensitivity is used to compare two or more equations
values indicate no real improvement is given by adding the for their “blindness” to offset variation between samples.
wavelength into the equation. Equations with large offset sensitivities indicate that particle
A2.3.5.2 If several wavelengths (variables) have low t or F size variations within a data set may cause wide variations in
values (less than 10 or 100, respectively), it may be necessary the analytical result.
to delete each of the suspect wavelengths, singly or in A2.3.8.2 The ISV is calculated as:
combination, to determine which wavelengths are the most k
critical for predicting constituent values. In the case where an ISV 5 ( bi
i51
(A2.13)
important wavelength is masked by intercorrelation with an-
other wavelength, a sharp increase in the partial F will occur A2.3.9 Random Variation Sensitivity:
when an unimportant wavelength is deleted and where there is A2.3.9.1 This statistic is also termed the index of random
no longer high intercorrelation between the variables still variation (IRV). Random variation sensitivity is calculated as
within the regression equation. the sum of the squares of the values of all regression coeffi-
A2.3.5.3 The t-statistic is sometimes referred to as the ratio cients. The larger the value, the greater the sensitivity to factors
of the actual regression coefficient for a particular wavelength such as: poor wavelength precision, temperature variations
to the standard deviation of that coefficient. The partial F value within samples and instrument, and electronic noise. The
described is equal to this t value squared; note that the t value higher the value, the less likely the equation can be transferred
calculated this way retains the sign of the coefficient, whereas successfully to other instruments.
all F values are positive. A2.3.9.2 The IRV is calculated using the expression:
A2.3.5.4 The partial F for a regression coefficient is calcu- k
lated as: IRV 5 ( =b2i
i51
(A2.14)
SSres ~all variables except one! 2 SSres ~all variables!
(A2.9) A2.3.10 Standard Error of the Laboratory (SEL) for Ref-
MSres ~all variables!
erence Chemical Methods:
A2.3.6 The Bias Corrected Standard Error: A2.3.10.1 The SEL can be determined by using one or more
A2.3.6.1 Bias corrected standard error measurements allow samples properly aliquoted and analyzed in replicate by one or
the characterization of the variance attributable to random more laboratories. The average analytical value for the repli-
unexplained error within. The bias value, b0, is calculated as cates on a single sample is determined as:
the mean difference between reference and IR estimated r
values: ȳi 5 ( yij
j51
(A2.15)
n
1
b0 5 n ( ~yi 2 ŷi! (A2.10) A2.3.10.2 SEL is given by:
Œ
i51
n ni
A2.3.6.2 The bias corrected standard error is calculated as:
( ( ~yij 2 ȳi!2
Œ
i51 j51
n SEL 5 (A2.16)
n~ri 2 1!
( ~ yi 2
i51
ŷi! 2 b0!2
SEc 5 n21 (A2.11) where the i index represents different samples and the j index
different measurements on the same sample.
Similar bias corrected values can be calculated for SECV.
A2.3.7 Standard Deviation of Repeatability (SDR): A2.3.10.3 This can apply whether the replicates were per-
A2.3.7.1 SDR is also referred to as the standard deviation of formed in a single laboratory or whether a collaborative study
difference (SDD) or standard error of difference for replicate was undertaken at multiple laboratories. Additional techniques
measurements (SD replicates). The SDR is calculated to allow for planning collaborative tests can be found in Ref 20. Some
accurate estimation of the variation in an analytical method due care must be taken in applying Eq. 2.3.16. If all of the
to both sampling, sample presentation, and analysis errors. The analytical results are from a single analyst in a single labora-
SDR can be used as a measure of precision for the reference tory, then the repeatability of the analysis is defined as
analytical method. =2 t(n (r − 1), 95 %) SEL, where t(n (r − 1), 95 %) is the
A2.3.7.2 The SDR is calculated using: Student’s t value for the 95 % confidence level and n (r − 1)
Œ
degrees of freedom. If the analytical results are from multiple
r
analysts and laboratories, the same calculation yields the
( ~yj 2 ȳj!2
j51
SDR 5 (A2.12) reproducibility of the analysis. For many analytical tests, SEL
r21 may vary with the magnitude of y. SEL values calculated for
A2.3.8 Offset Sensitivity: samples having different ȳi can be compared by an F-test to
A2.3.8.1 Also termed systematic variation or index of determine if the SEL values show a statistically significant
systematic variation (ISV), offset sensitivity is equal to the sum variation as a function of ȳi.
27
E 1655
REFERENCES
(1) Association of Official Analytical Chemists, AOAC Offıcial Methods of (18) Manne, R., Chemometrics and Intelligent Laboratory Systems, Vol 2,
Analysis, Method 989.03, 1990, pp. 74–76. 1987, p. 187.
(2) Journal of the Association of Offıcial Analytical Chemists, Vol 71, (19) Helland, I. S., Communications in Statistics (Simulation and Com-
1988, p. 1162. putation), Vol 17, 1988, p. 581.
(3) Landa, I., Review of Scientific Instruments, Vol 50, 1979, pp. 34–40. (20) Helland, I. S., Scandinavian Journal of Statistics, Vol 17, 1990, p. 97.
(4) Landa, I., and Norris, K. H., Applied Spectroscopy, Vol 23, 1979, pp. (21) Draper, N. R., and Smith, A., Applied Regression Analysis, John
105–107. Wiley and Sons, New York, NY, 1981.
(5) Kortüm, G., Reflectance Spectroscopy, Springer-Verlag, New York,
(22) Workman, J., “NIR Spectroscopy Calibration Basics,” in Near
NY, 1969, p. 111.
Infrared Analysis, Burns, D., and Ciurczak, E., eds., Marcel-Dekker,
(6) Honigs, D. E., Freelin, J. M., Hieftje, G. M., and Hirschfeld, T. B.,
Inc., New York, NY, 1992, pp. 247–280.
Applied Spectroscopy, Vol 37, No. 6, 1983, pp. 491–497.
(7) Hrushka, W., “Data Analysis: Wavelength Selection Techniques,” in (23) Mark, H., and Workman, J., Statistics in Spectroscopy, Academic
Near Infrared Technology in the Agricultural and Food Industries, P. Press, Boston, MA, 1991.
Williams and K. Norris, Eds., American Association of Cereal Chem- (24) Hoaglin, D.C., Welsch, R.E. Amer. Statist. 1978, 32, 17.
ists, St. Paul, MN, 1987. (25) Whitfield, R.G., Gerger, M.E., and Sharp, R.L., Applied Spectros-
(8) Mark, H., Applied Spectroscopy, Vol 42, No. 8, 1988, pp. 1427–1440. copy, Vol 41, 1987, pp. 1204–1213.
(9) Brown, P. J., Journal of Chemometrics, Vol 6, 1992, pp. 151–161. (26) Geladi, P., and Kowalski, B. R., Analytica Chimica Acta, 185, 1986,
(10) Fredricks, P. M., Osborn, P. R., and Swinkels, P. R., Analytical pp. 1–17.
Chemistry, Vol 57, 1985, pp. 1947–1950. (27) Miller, R., Simultaneous Inference, 2nd ed., Springer, New York, NY,
(11) Kennedy, W. J., and Gentle, J. E., Statistical Computing, Marcel 1981.
Dekker, New York, NY, 1980. (28) Mark, H., and Workman, J., Analytical Chemistry, Vol 58, 1986, p.
(12) Allen, D. M., Technical Report Number 23, University of Kentucky 1454.
Department of Statistics, August 1981.
(29) Honigs, D. E., Hieftje, G. M., Mark, H. L., and Hirschfeld, T. B.,
(13) Lindberg, W., Persson, J., and Wold, S., Analytical Chemistry, Vol 55,
Analytical Chemistry, Vol 57, 1985, p. 2299.
1983, p. 643.
(14) Martens, H. A., and Naes, T., Multivariate Calibration, John Wiley (30) Youden, W. J., Statistical Manual of the Association of Offıcial
and Sons, New York, NY, 1989. Analytical Chemists, AOAC, Arlington, VA, 1979.
(15) Geladi, P., and Kowalski, B. R., Journal of Chemometrics, Vol 1, (31) Martens, H., and Naes, T., In Williams, P., and Norris, K., Eds., Near
1986, pp. 1 and 18. Infrared Technology in the Agricultural and Food Industries, Ameri-
(16) Haaland, D. M., and Thomas, E. V., Analytical Chemistry, Vol 60, can Association of Cereal Chemists, St. Paul, MN, 1987, pp. 57–87.
1988, pp. 1193–1202. (32) Hald, A., Statistical Theory with Engineering Applications, John
(17) Wold, S., Ruhe, A., Wold, H., and Dunn, W. J., SIAM Journal of Wiley and Sons, New York, NY, 1952.
Science and Statistical Computations, Vol 5, 1984, p. 735. (33) Dixon, W. J., Biometrics, Vol 9, 1953, pp. 74–89.
The American Society for Testing and Materials takes no position respecting the validity of any patent rights asserted in connection
with any item mentioned in this standard. Users of this standard are expressly advised that determination of the validity of any such
patent rights, and the risk of infringement of such rights, are entirely their own responsibility.
This standard is subject to revision at any time by the responsible technical committee and must be reviewed every five years and
if not revised, either reapproved or withdrawn. Your comments are invited either for revision of this standard or for additional standards
and should be addressed to ASTM Headquarters. Your comments will receive careful consideration at a meeting of the responsible
technical committee, which you may attend. If you feel that your comments have not received a fair hearing you should make your
views known to the ASTM Committee on Standards, at the address shown below.
This standard is copyrighted by ASTM, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
Individual reprints (single or multiple copies) of this standard may be obtained by contacting ASTM at the above address or at
610-832-9585 (phone), 610-832-9555 (fax), or [email protected] (e-mail); or through the ASTM website (www.astm.org).
28