Sensory Difference Testing
Sensory Difference Testing
used as a tool to answer the following question: 'How Directional Difference Tests
large is the sensory difference between my products?'
These methods require the nature of the difference to
These results will permit the scaling of sensory mag- be specified in the instructions. Table 1 gives details
nitudes and sensory differences, information thought
for each method.
until now to be provided only through the use of
The chance probability represents the probability
rating methods.
of getting the answer right if the subject cannot dis-
criminate between the products. For the sorting
Traditional Protocols Used in protocols, it is important to keep in mind the
Discrimination Testing fatiguing nature of' food samples and the memory
requirements that might significantly hinder subjects'
There is no limitation on the number of different
performance. These tasks are best suited to textural
discrimination tests that can be used to study and visual investigations.
sensory differences. The list which will be given here
is not exhaustive, and alternative discrimination Nondirectional Difference Tests
methods can easily be created. The issue is then
to develop the proper model to conduct statistical Those are the most commonly used methods in sens-
analyses. ory discrimination testing, even if their statistical
The discrimination protocols can be classified into power can be up to 100 times inferior to that of the
two main categories: those that require the nature of directional difference tests (Table 2 1.
the sensory difference to be specified in the instruc-
tions and those that d o not.
Traditional Data Analyses
The protocols described here will be illustrated by
the comparison of two cookies A and B , B being The traditional way of: analyzing data extracted from
slightly slveeter than A. The samples are presented difference tests involves no consideration of the p s y
blind, and no other difference (visual, textural, tem- chological processes occurring at the time a response
perature) is assumed to be present. X subject will is generated. This is what is called a response-based
perform several tests in succession, or a group of analysis. The information obtained from this rather
subjects will perform one test each. From the results limited statistical analysis is either 'Yes, the products
of the test, the similarity of the difference between the are different,' or 'KO, they are not different.' L o
two samples will be concluded with a given level of information is obtained regarding the size of the sens-
confidence, depending on the sample size (number ory difference. This can yield inconsistent results and
of tests) involved in the comparison. this issue will be discussed later.
@EBaBBO%p
Find the least sweet cookie
p'in p
Find the p sweetest samples
~
p ! (n-p)!
p-out-of-n n samples: A : n - p, B : p Make one group of n - p similar If n is o d d : ~
n!
samples and one group of p
similar samples p ! (n- p)!
If n is even: 2 x ~
n!
In discrimination testing, one is concerned with accept the HI hypothesis that the products are differ-
two types of error. The first one is the type I error: ent. For instance, if a study with 20 tests was run
concluding that the products were significantly differ- using the two-alternative forced choice (2-AFC)
ent while they were not. The second is the type I1 protocol and 14 were performed correctly, this result
error: failing to detect an existing sensory difference. would be compared to an expected chance result of
Statistical tests are conducted at a particular ct level, 10. Tables have been published indicating the min-
usually 0.05, describing the type I error. An ct level of imum number of tests correct for a given sample size
0.05 means that we have only a 5% chance of con- to be significant at a given ct level, usually 0.05. In our
cluding that the products were different while they example, to be significant at the 5% level, we would
were not. For the type I1 error, described by the need 15 tests correct. Therefore, we cannot conclude
parameter P, the level used is usually 0.2 or 0.1. that the two cookies were significantly different. We
The statistical power of the test is defined by: cannot reject Ho.
Power = l - P . A power of 80% implies that a test It is critical to be aware that we are not concluding
has 80% chance of detecting a difference of a given that the cookies are the same. We can only conclude
size. The issue of power will be discussed later. that they are not different. H a d we used a larger
sample size, we might have obtained a significant
Binomial Analysis result. In order to accept Ho, it is necessary to
For a given protocol, the number of correct tests is consider the power of the test.
compared to the number that would be expected by
chance. This type of model is called a guessing model.
x2 Analysis
It assumes that if the subject cannot discriminate This analysis is traditionally used with tests yielding
between the samples, he or she will pick a sample or results in a contingency table, such as the AINot A
an answer randomly. We are testing the Ho hypothesis and same-different tests. An example of a contin-
that the two products are identical. If rejected, we will gency table is shown in Table 3 . A study between
5144 SENSORY EVALUATIONISensow Difference Testina
Table 3 Results of a same-different test conducted with 100 power, This might result in the release of reformulated
consumers
products in the marketplace that will be rejected by
Presentation Response
~~
consumers.
‘Different’ ‘Same’ These experimental results underscore the insuffi-
ciency of relying on the number of correct answers in
Same 20 30
a given test to determine how different two products
Different 35 15
are. It is essential to take into consideration two very
important aspects of sensory evaluation: variability
and behavior. These aspects are taken into account in
the two cookies A and B was conducted with 100 Thurstonian modeling.
consumers, each getting either a different pair ( A B or
B A ) or an identical pair ( A A or B B ) . Thurstonian Models
A x2 value can be calculated and we find that
These models are very similar to those encountered in
x2 =9.1. Looking in a table for the significance of the field of psychology and discussed in signal detec-
x2, we find that a x2 of 9.1 and 1 degree of freedom tion theory.
is significant at the 1%level.
Therefore, here we can reject the null hypothesis
Ho and conclude that the products are different. Thurstonian modeling assumptions
Variability Let’s take our sweet cookie example
again. The idea behind perception variability is that,
Taking into Account the Degree of when tasting the same cookie several times, the per-
Difference Between the Products: ceived sweetness will not always be the same. O n
Thurstonian Models average it will have a certain intensity, but at any
given moment, the perceived intensity can be slightly
The data analyses discussed above only provide one stronger or slightly weaker (Figure 1).Several reasons
type of information from discrimination testing: ‘Are can explain these variations: first, there is always
my products significantly different or not?’ We never some random noise in the nervous system due to the
mentioned the magnitude of the difference between spontaneous firing of nerves. Second, the number of
the products. Was it a large difference or a small compounds binding with the sweet receptors can
difference? H o w does it compare to results from pre- be slightly different each time. Third, there might be
vious studies? In order to answer these questions, it is variation in the cookie itself: the sweet compounds
necessary to introduce new notions, such as an index might not be evenly distributed on the cookie, or the
of perceptual difference, called d’. We are first going subject might take bites of slightly different sizes
to show why it is essential to consider this index when every time.
comparing products. The likelihood of each sweetness magnitude occur-
ring for the cookie can be represented by a distribu-
Insufficiency of the Guessing Model: The Paradox tion. This distribution is usually assumed to be
of Discriminatory Nondiscriminators normal, even though alternative models have been
It has been reported numerous times in the literature published using other types of distributions. Figure 2
that, while using the same subjects and the same illustrates this concept. The height of the distribution
products, the null hypothesis could be rejected using at each intensity level represents the likelihood of this
some protocols but not others. The most illustrious intensity occurring in a given tasting. The intensity at
example is the result discrepancy between the 3-AFC the mean of the distribution is the most likely. The
and triangle tests. While the two tests have the same further away the intensity is from the mean, higher or
design and only very slightly different instructions, lower, the less likely it is to occur.
subjects’ performance is significantly greater in the
3-AFC than in the triangle test. With a 3-AFC test, it
might be concluded that the products are significantly
different, while we would not be able to reach that
conclusion with the triangle test. Which result is
right? O r are the two methods giving two versions
of the same information, i.e., the magnitude of the T
Weaker
t
Mean
t
Stronger
Sweetness intensity
ct
The power of a test is also shown in the calculated cited at the end of this section in order to build a more
variance of d’: the more powerful the test, the smaller general knowledge about the topic of discrimination
the variance of d’. This variance is very useful when testing.
comparing several d’ values (for instance, when com- We still need to mention that the concepts of
paring a standard to several reformulations) in order sensory magnitudes and d’ values are not limited to
to determine whether they are significantly different sensory difference tests and that they can easily be
from each other. extended to ratings on category scales and consumer
preference and hedonic data. This approach is ex-
Replicated testing Considering the sample sizes tremely useful for sensory science since it allows the
required for the duo-trio and triangle tests, it would connection of all kinds of sensory measurements in a
be more suitable to use the 2- or 3-AFC protocols. If it common structural framework.
is impossible to use the directional difference tests,
the power of the triangle and duo-trio tests can be Carbohydrates:Sensory Properties; Sensory
See also:
somewhat increased by replicating the number of Evaluation: Sensory Characteristics of H u m a n Foods;
Food Acceptability a n d Sensory Evaluation; Practical
tests per subject. However, combining data from dif-
Considerations; Sensory Rating and S c o r i n g Methods;
ferent subjects brings the issue of overdispersion. Ref-
Descriptive Analysis; Appearance; Texture; A r o m a ; Taste
erences in the Further Reading section at the end of
this chapter should be consulted in order to insure
accurate data analysis. Further Reading
Bi J, Ennis DM and O’Mahony M (1997) How to estimate
Experimental variables Under certain conditions, d’ and use the variance of d’ from difference tests. Journal
values obtained from discrimination tests may not of Sensory Studies 12: 87-104.
correspond exactly. This may be due to the effect of Brockhoff PB and Schlich P (1998)Handling replications in
experimental variables such as memory and sequence/ discrimination tests. Food Quality and Preference
adaptation effects which are not considered in the 9: 303-312.
models. Since a larger d’ value requires a smaller Ennis DM (1993) The power of sensory discrimination
sample size to be detected, it may be beneficial to methods. lournal of Sensory Studies 8 : 353-370.
use protocols providing larger d’ values. Experimen- Ennis DM and Bi J (1998) The beta-binomial model:
accounting for inter-trial variation in replicated differ-
tally, it has been found that memory requirements can
ence and preference tests. lournal of Sensory Studies 13:
significantly hinder subjects’ performance and thus
3 8 9 412.
product discrimination. Under certain conditions, Green DM and Swets JA (1966) Signal Detection Theory
tests with only two samples (2-AFC, same-different) and Psychophysics. New York: Wiley.
have been found more sensitive (higher d’ value) than Kaplan HL, Macmillan NA and Creelman CD (1978)
those with more stimuli (3-AFC, triangle). Tables of d’ for variable-standard discrimination para-
This aspect of difference testing should also be digms. Behavior Research Methods and Instrumentation
considered when selecting a protocol to conduct a 10: 796-813.
discrimination study. Lawless HT and Heymann H (1998)Sensory Evaluation of
Food: Principles and Practices. New York: Chapman &
Hall.
Conclusion Macmillan NA and Creelman CD (1991) Detection
Theory: A User’s Guide. New York: Cambridge Univer-
For an unaware scientist, the topic of discrimination sity Press.
testing can be perceived as deceptively simple and Meilgaard M, Civille GV and Carr BT (1999) Sensory
limited. However, we showed here that proper know- Evaluation Techniques, 3rd edn. Boca Raton: CRC
ledge from psychology, physiology, and statistics is Press.
necessary in order to insure proper data collection O’Mahony M (1986) Sensory Evaluation of Food: Statis-
and analysis. Furthermore, discrimination testing tical Methods and Procedures. New York: Marcel
Dekker.
can provide very valuable information by not limiting
O’Mahony M (1995) Who told you the triangle test was
the information to a mere yesho answer regarding simple? Food Quality and Preference 6: 227-238.
the existence of a difference. This can be achieved by O’Mahony M, Masuoka S and Ishii R (1994) A theoret-
calculating d’ values and their variances from widely ical note on difference tests: models, paradoxes
available tables, and by determining the magnitude of and cognitive strategies. Journal of Sensory Studies 9:
the difference between the samples. 247-2 72.
This chapter summarized the scope of this topic. Stone H and Sidel JL (1993) Sensory Evaluation Practices,
The reader is advised to refer to the review articles 2nd edn. San Diego: Academic Press.
5148 SENSORY EVALUATlON/SensoryRating and Scoring Methods
Ordinal Scale
Sensory Rating and Scoring
Methods An ordinal scale is one which allows observations to
be ordered according to whether they have more or
J A McEwan, MMR Food a n d D r i n k Research less of a particular attribute. Successive numbers or
Worldwide, Wallingford, UK
words are used to indicate more (or less) of the attri-
D H Lyon, Carnpden Food a n d D r i n k Research
Association, C h i p p i n g Carnpden, Gloucestershire, UK
bute being measured. Ordinal scales do not allow the
amount of difference between observations to be
Copyright 2003, Elsevier Science Ltd All Rights Reserved quantified. The nine-point hedonic scale (described
later) is ordinal, as are ranked data.
Rating and scoring methods provide the basis for An interval scale is one where the distance between
quantification of sensory information. Although points on the scale is quantifiable. In many instances,
these two terms are sometimes used interchangeably the distance or intervals between points on a scale will
by sensory scientists, they have different meanings. represent an equal perceptual distance. For example,
Rating refers to the quantification of information if the perceptual distance between 1 and 2 on a seven-
by the use of ordinal categories, while scoring is a point scale of sweetness was the same perceptual
more defined form of rating as it uses a numerical distance as between 2 and 3 , 3 and 4, and so on,
interval or a ratio scale, of which the properties are then this scale would have interval properties.
known. A scale can be defined as a measurement
Ratio Scale
continuum divided into successive units according to
the properties associated with it. A ratio scale is one where the observations collected
There are many different rating and scoring can be expressed as a percentage or ratio of each
methods used in sensory analysis, as illustrated both other. For example, a person eating 100 g of chocolate
here and in other articles. In each case, these scales are a day eats twice as much as a person eating 50 g day-‘.
physical measurement tools used to measure some An example of a ratio scale in sensory analysis is
sensory phenomenon perceived by individuals. Thus, magnitude estimation, which will be discussed later.
implicit in using rating and scoring is that these scales The main difference between interval and ratio scales
provide meaningful representations of some psycho- is that the latter has a true zero, whereas the zero
logical process or processes. point of an interval scale is arbitrary.
When considering rating and scoring methods, the
reader should be aware that the experimental design
considerations given to sensory analysis procedures Data Collection Methods and Data
should be observed. In this article, example forms Analysis
are given with the illustrations of different scales
Nominal Data
to aid the reader in designing appropriate question-
naires. Nominal data can be collected in a number of ways
(Figure l),but common to all nominal data is that
each observation can only fall into one category. A
Type of Scale
logical first step in analysis and interpretation of the
Four types of scale can be used to collect data: nom- data, therefore, is to produce a histogram indicating
inal scales, ordinal scales, interval scales, and ratio
scales.
Nominal Scale Please taste the sample coded 457, and identify which of
the four basic tastes you perceive.
A nominal scale is one where data collected are cat-
Sweet
egorized by a name or a number. Each observation
collected using these scales must fall within one of the Sour
categories. For example, ‘canned,’ ‘frozen,’ ‘dried,’ Salt
‘chilled,’ and ‘fresh’ are five categories used to de- Bitter
scribe methods of food preservation. These categories
Unable to
have no logical ordering and, thus, the key point Identify
about nominal scales is that the different categories
have no quantitative relationship. Figure 1 Taste identification test using a nominal scale