Farm Animal Metabolism and Nutrition

calibration spectra and data, which are
based partially on random correlations and
not true chemical/spectral relationships, or
are very specific to the calibration sample.
This can occur because there are so many
wavelengths available. In theory, if one has
700 wavelengths and only 100 samples, by
using enough wavelengths (100 unknowns
require 100 knowns) one gets a perfect fit.
One solution has been to limit the number
of wavelengths selected (i.e. one for every
ten samples in the calibration to a maximum
of ten or so). The problem then is that there
is a lot of potentially useful spectral
information available which is not being
used. The solution used to address both
problems simultaneously has been to use
what are generally known as ‘whole
spectrum’-based procedures.
As a result, most of the chemometric
efforts over the last 10 years or so have
revolved around procedures such as factor
analysis, principal components (PCR) and
partial least squares (PLS) regression
(Sharaf et al., 1986). PCR and PLS, in
particular, have enjoyed great success. In
these two procedures, the entire spectrum
is used. The spectra are decomposed into a
series of factors which represent the
variance in the spectra. In such a manner,
the information in the spectra is com-
pressed into a reduced series of factors
which can then be used in a regression
process to determine the analyte of
interest. Other procedures used include
Neural Networks (McClure et al., 1992),
genetic algorithms (Goldberg, 1989, used to
reduce the number of wavelengths to be
considered) and just about any method
ever devised to extract predictive informa-
tion from data. At present, efforts based on
PLS, PCR and Neural Networks seem to be
the most popular, with genetic algorithms
used to select wavelengths. Considerable
theoretical work has been carried out using
factor analysis, but it has not found much
use because one needs to know what the
spectra of the factors (components in the
samples) are before starting and, in
complex samples, such as feedstuffs, that is
virtually impossible.

Calibration validation and testing Regardless of what method is used to develop the calibration, there are many steps which need to be performed in deter- mining the final calibration. For example, how many wavelengths in an MLR are needed. If five are enough, then using more just results in over-fitting. The same applies to PLS; the number of factors possible is one less than the number of samples, but rarely are more than a dozen or so needed, even for large data sets. How does one decide how many to use? For each procedure, PLS or MLR, and even in many cases for each software package, a number of statistical tests are available, which in essence determine when the increase in accuracy obtained by adding an additional factor reaches the point of diminishing returns. One simple method used with MLR is to divide the calibration set into two sets of data, a calibration and validation set. The equations are then developed using the calibration samples and the validation samples are predicted. In developing calibrations, it is not uncommon to try various data pre-treatments (i.e. deriva- tives, scatter corrections, etc., which are used to help extract the information from the spectra); the result is that one often has many different calibrations to examine and choose from. Since the validation set is involved in the development process, it becomes likely that one will find a set of terms which also randomly does well on the validation set, but not on future samples. By placing restrictions on the criteria for selecting how many terms one can use, experience has shown that one improves the likelihood that the final equation selected will be valid for future samples. The final test comes when one applies the selected equation to a new set of samples (test set). Very similar procedures are used for Neural Net calibrations. For PLS and PCR a slight variation, called one-out cross- validation, is used. In this procedure, each sample is removed from the data set and a calibration developed using the remaining samples. This is repeated Ntimes (each

196 J.B. Reeves III

Farm Animal Metabolism and Nutrition

Get our desktop app

Company

Features

Documentation

Resources