A Perceptual Model of Pulse Salience and Metrical Accent in Musical Rhythms
A Perceptual Model of Pulse Salience and Metrical Accent in Musical Rhythms
RICHARD PARNCUTT
McGill University
Introduction
Imagine that you are walking down a quiet city alley and enter a jazz
club. The door opens and suddenly you hear the music. In just a second or
two you have a strong impression of the "feel" of the music- in particu-
409
410 Richard Parncutt
lar, the way it "swings"- its "beat" or "pulse." If instead you had been
confronted with some Ghanaian percussion music, or a disco number, or
Ravel's Bolero, or a movement of a Bach Motet driving inexorably for-
ward in 3/4 time toward a glorious climax, you would have received an
entirely different qualitative impression of the music's beat. But the experi-
ence would have been similar in two respects: the strength of the impres-
sion of beat (swing, pulse), and the remarkably short time required to
perceive it.
Yeston (1976) referred to an isochronous sequence of similar-sounding
into equivalence classes, such as all nth beats of a bar, by analogy to pitch
classes or chroma (Benjamin, 1984). Jones (1987b) avoided the semantic
ambiguity of the term grouping by distinguishing between horizontal (se-
rial) and vertical (metrical) components of rhythmic production. In the
present study, the terms serial and periodic temporal grouping are used,
and meter is regarded as a form of periodic grouping.
The relative importance of periodic and serial grouping depends on the
listener. For infants, serial grouping is generally more important than
periodic grouping (Bamberger, 1980; Drake, Dowling, & Palmer, 1991),
METHOD
Listeners
Twenty-two students (from the University of Stockholm) and researchers (from the
Department of Speech Communication and Music Acoustics, Royal Institute of Technol-
ogy, Stockholm) took part. Their musical experience (defined as the number of years
regularly practicing and/or performing a musical instrument, including voice) covered a
wide range: minimum 0 years, maximum 30 years, mean 12 years.
Apparatus
Stimuli
As shown in Figure 1, six different rhythmic patterns were crossed with six different
tempi, making 36 trials in all. The first four of the rhythmic patterns (pulse, waltz, march,
swing) were metrically relatively simple and unambiguous. The last two (skip, cross) were
more ambiguous: They may be perceived in either 3/2 or 6/4 meter (that is, as either 3
groups of 2 beats, or 2 groups of 3) depending on tempo (cf. Handel & Oshinsky, 1981).
One of the aims of the present experiment was to investigate this ambiguity.
Pulse Salience and Metrical Accent 415
Tempi were specified by the number of notes or events per unit time, as this allowed the
full range of musical tempi to be covered more satisfactorily for each pattern than specify-
ing the number of beats per unit time. The experimenter adjusted the note rates to cover as
wide a range as possible within the musical and physical constraints of the experiment, by
subjective evaluation of each of the six patterns when played at different rates. The note
rates that were adopted were logarithmically equally spaced and ranged from 50 to 400
notes per minute, or 0.83 to 6.7 notes per second. The corresponding IOIs of the nominal
1/4-note beats in each pattern are shown in Figure 1.
Early events in a rhythmic sequence are more likely to be interpreted as downbeats than
are later events (cf. Longuet-Higgins ÔCLee, 1982). In the present experiment, a starting
point effect was undesirable, as the aim was to investigate the perceptual properties of
cyclically repeating rhythms as a function only of rhythmic pattern and note rate. The
starting-point effect may be softened by starting a sequence at a subliminal intensity and
gradually increasing the level (Vos, 1973). But this does not eliminate the effect, as it is still
generally possible to define the first audible event. Alternatively, the sequence may be
started at a very fast tempo and then made to decelerate toward the intended tempo (Royer
& Garner, 1970). This was not a practical alternative in the present study, where tempo
was itself an important independent variable. Here, the starting-point effect was avoided
simply by starting rhythms at random temporal points in the cycle. For example, pattern
(b) waltz was presented in either of two forms: 121212 or 212121 (where 1
= 1/4-note, 2 = 1/2-note in Figure 1). The probability of a pattern starting on a certain
note was made proportional to the interonset interval that would otherwise have preceded
it. So waltz was twice as likely to start with a 1/4-note than with a 1/2-note.
The attention of the listener was maintained by randomly selecting the timbre (instru-
ment) to be used in each trial from sounds available on the drum machine (labeled low
conga, high conga, timbale, low cowbell, high cowbell, hand clap, ride cymbal, bass drum,
snare drum, low torn, mid torn, high torn, rim shot, closed high-hat, open high-hat).
Timbre remained constant throughout each trial. The loudness of the various instruments
had been equalized by ear before the experiment.
416 RichardParncutt
Procedure
In each trial, listeners were asked to press or tap the space bar of a computer keyboard
in time with the underlying beat of the rhythm. They were permitted to tap in any way they
wished; most used the index finger, middle finger, or both together, of their dominant
hand. A tap was deemed to have coincided with a particular 1/4-note (as no tated in Figure
1) if it fell in a temporal category of width one 1/4-note, centered on the onset of the note.
In other words, temporal category boundaries for taps were set midway between 1/4-note
onsets. This procedure is consistent with the categorical perception of rhythmic patterns as
described by Clarke (1987) and Schulze (1989).
In each trial, tap times were recorded until such time as four consecutive tap times were
equally spaced, indicating a consistent pulse response. At this point, the stimulus stopped,
Figure 2 shows the number of times each pulse response was selected
(plain text). The results are compared with predictions according to the
model (italics). The comparison between the results and predictions will
be discussed later.
Fig. 2. Results of Experiment 1 (pulse salience) for all 161 selected pulses. In the table on
the right side of each panel, plain text denotes numbers of times each pulse was selected;
italics denote corresponding calculated values according to Equation 7 with free parameter
values t = 550 ms, i = 1.6, /* = 760 ms, a = 0.23, and ; = 2.0. Each column corresponds
to a particular combination of rhythmic pattern and note rate, or to one of the 36 trials in
the experiment.
Pulse Salience and Metrical Accent 417
Fig. 2. (Continued) Calculations for each trial are normalized (multiplied by a constant) so
as to add up to 22, the number of listeners. The numbers 1-6 (top right) are rate numbers
(see Figure 1). The periods and phases shown on the far left are expressed in nominal 1/4-
note beats. Phases are shown only if necessary for the unambiguous specification of pulse
responses; otherwise, a dash (-) is marked.
418 Richard Parncutt
Some listeners tapped in synchrony with the notated barlines, while others
tapped at the same rate but out of phase with the barlines. A similar effect
was found in the present results for the march rhythm (Figure 2c), for
responses with period 4 at slow to moderate note rates. For rates 1-4,
results were low for both phase 0 and phase 2. For rates 5 and 6, however,
phase 0 (the music-theoretical downbeat) predominated over phase 2. A
similar effect of tempo was observed in the swing (Figure 2d) and skip
(Figure 2e) rhythms at period 6, phases 0 and 3 (or 4). In general, the
results indicate that long events are more important than short events in
simple rhythmic patterns. The same set of six patterns was examined as in
Experiment 1.
In music theory, different beats in a measure are assigned different
accent strengths (see, e.g., Lerdahl & Jackendoff, 1983). In a 4/4 measure,
for example, the first 1/4-note beat is the strongest, the third beat is the
second strongest, and the second and fourth beats are the weakest. In a 6/4
measure, the second-strongest beat is the fourth of the six 1/4-note beats
(phase = 3); but in a 3/2 measure, the second-strongest beat is the fifth
1/4-note beat.
Palmerand Krumhansl (1990) measuredthe relative salience of the differ-
ent beats in standard musical meters by the following method. They first
presented a low-pitched pulse train with a period in the range 1.7-4.8 s and
asked their listeners to imagine each pulse as the first of two, three, four, or
six isochronous beats. After four pulses, a higher-pitched probe tone was
sounded between the pulses, corresponding to one of the imagined beats.
The listeners then rated how well the probe tone fit the imagined metrical
context. The results were consistent with music-theoreticalnotions of metri-
cal accent and confirmed that their listeners had quite detailed and stable
knowledge of typical Western meters.
The present experiment differed from that of Palmer and Krumhansl in
that it aimed to measure perceptual properties of real-time (rather than
Pulse Salience and Metrical Accent 421
METHOD
Listeners
Procedure
RESULTSAND DISCUSSION
(corresponding to the cycle length), four listeners tapped on the first beat
(phase 0), whereas seven tapped on the third (phase 2). It appears in this
case that the IOI preceding the event had a slightly greater effect on its
subjective accent than did the IOI following the event. According to Povel
and Essens (1985, p. 415), the initial and final events in a cluster of three
events have subjective accent; here, the accent on the initial event of the
group was stronger than the accent on the final event.
Results for swing (Figure 4d) at the first and fourth 1/4-note beats were
not significantly different from each other, that is, the downbeat was
about equally likely to fall on either beat. This is consistent with the results
of Experiment 1 for swing at rate 4, period 6, phases 0 and 3 (Figure 2d).
The downbeat of skip was similarly ambiguous, falling on either the first
Pulse Salience and Metrical Accent 423
or the fifth beat (cf. Figure 2e, rate 4, period 6, phases 0 and 4); and the
downbeat of cross was perceived with about equal likelihood on beats 1,
3, or 5 (cf. Figure 2f, rate 4, period 6, phases 0, 2, 4).
These results of Experiment 2 only apply for a note rate of 150 notes/
min. The results of Experiment 1 suggest that differences between event
saliences would have been greater, that is, the meter of the rhythms would
have been less ambiguous, at faster templ.
Model
INPUT
For example, the waltz rhythm of Figure 1 had IOI (0) = 2, IOI (1) = 0,
and IOI (2) = 1. The IOIs add up to the cycle length:
PHENOMENAL ACCENT
assumption: Durational accent increases with IOI for small values of IOI
and saturates as IOI approaches and exceeds the duration of the echoic
store (auditory sensory memory).
In Experiment 1, it was observed that the downbeat of a cyclic pattern
(the event initiating the longest IOI) became less ambiguous as the rhythm
was played faster: At faster tempi, phenomenal accent was usually greater
for events preceding longer IOIs, whereas at the slower tempi, no consis-
tent relationship was evident between phenomenal accent and IOI. Mas-
saro's concept is consistent with these results, given that the IOIs in Experi-
ment 1 were predominately greater than the duration of the echoic store at
very slow tempi and less at very fast templ.
428 Richard Parncutt
which other, quieter sounds may be masked. The sustain segment is typi-
cally prolonged by the acoustic resonance of the sound source and by
reflection of the radiated sound from objects and surfaces in the vicinity.
So perceived time intervals following loud sounds will tend to be longer
than perceived time intervals preceding loud sounds. Moreover, forward
masking is stronger than backward (Moore & Glasberg, 1983; Zwicker &
Fasti, 1972), further enhancing the positive correlation between the IOI
following an event and the loudness of the event.
The saturation of durational accent for IOIs exceeding roughly 1 s may
the effect of sharpening the perceptual difference between long and short
events, thereby sharpening the difference between the salience of different
pulse responses and making metrical interpretations less ambiguous. Ac-
cording to this explanation, i should increase with musical experience.
Does the IOI preceding an event influence its durational accent? Povel
and Essens (1985) proposed that an event in a nonisochronous sequence
of physically identical sounds is heard as accented, or perceptually
marked, if it satisfies one of the following three criteria: (1) it is relatively
isolated, (2) it is the second of a cluster of two events, or (3) it is first or
C'
m N V'
(1 - exp{-/?/r}P
where N is defined as in Equation 4, and p = Pb is the pulse period in
milliseconds. (The expression in parentheses is derived from Eq. 3.)
In each case, responses are limited to much the same range of periods.
Perusal of the experimental literature permits the following generaliza-
tions (cf. Fraisse, 1982).
METRICAL ACCENT
K(T)=^Sp(P,Q) (8)
The above model was tested by comparing calculations with the results
of the experiments reported above. Values for the free parameters of the
model were estimated by an automatic procedure implemented in the
language Le_Lisp. The procedure adjusted parameter values by small steps
until the Pearson correlation coefficient r between measurements and cal-
culations was a maximum. The results of this procedure are shown in
Table 1. These results were independent of initial parameter estimates for
a range of feasible sets of initial values.
In Experiment 1 (pulse salience), measurements were compared with
calculations according to Equation 7. Computation time was reduced by
considering only those pulses (periods, phases) that had been selected in the
experiment (161 data points). Calculated values were first normalized so
that, for each trial of the experiment, their sum (like the sum of the experi-
mental data) was 22, the number of listeners in the experiment. Then all five
free parameters (r, /, )Lt,cr,/')were gradually and independently varied until
the correlation between calculations and experimental data was optimal (r
= .88). This value is high in view of the relatively large number of degrees of
freedom in the comparison: 161 (data points) - 36 (normalization) - 5
(free parameters) - 2 (correlation) = 118. The calculated pulse saliences
from which this value was determined appear in italics in Figure 2.
Pulse Salience and Metrical Accent 441
TABLE 1
Optimal Values of the Free Parameters
Parameter t i n a j r
1. Pulse salience
a. Individual pulses 550 1.6 760 0.23 2.0 0.88
b. Distribution of periods - - 710 0.22 - -
2. Accent
a. Phenomenal 200 2.0 - - - 0.87
b. Metrical 180 1.7 660 0.14 1.9 0.95
3. "Typical" 500 2 700 0.2 2 -
CATEGORICAL PERCEPTION
(Povel, 1981). As discussed earlier, the distinction between even and un-
even subdivisions of a timespan is also perceived categorically (Clarke,
1987; Schulze, 1989).
The assignment of rhythmic events to temporal categories is a complex
process. For example, Clarke (1987) investigated the position of the cate-
gory boundary between alternative interpretations of rhythmic figures
(such as dotted, ternary, and binary subdivisions of a beat) and found that
the boundary tended to shift so as to favor perceptual judgments that
conform to the prevailing metrical context. Desain and Honing (1989)
EXPRESSIVE TIMING
Important events are thus both delayed and lengthened in musical per-
formance. This is equivalent to saying that local tempo (otherwise known
as instantaneous, or micro, or linear tempo) slows in the vicinity of impor-
tant events.
A continuous function of local tempo against time may be formulated
mathematically by analogy to the velocity and momentum of moving
objects (Kronman &cSundberg, 1987; Sundberg & Verrillo, 1980; Todd,
1992). Such a function may possibly be used as a heuristic for producing
musically acceptable performances. Of course, local tempo depends on the
such as 3:1:2:2 occur in the same context). The temporal position of the
unaccented event is determined only by temporal proximity to its accented
neighbor. Musical experience suggests that the perceived relationship be-
tween the two events may be strengthened simply by reducing this dis-
tance. An interesting analogy in the area of musical intonation is the
sharpening of the leading tone in performance (Nickerson, 1949; Ward,
1970). Apparently, sharpening the leading tone strengthens its melodic or
voice-leading relationship with the tonic scale degree, thereby strengthen-
ing the overall (melodic and harmonic) relationship between tonic and
hypothesis: The precision with which an event can be timed, and the
precision with which a deviation in the timing of an event can be detected,
depend on the metrical accent of that event.
Two pieces of evidence support this hypothesis. First, experimental data
on performance timing suggest that the timing of metrically stronger beats
is more precise (i.e., less variable) than the timing of metrically weaker
beats. For example, the data of Shaffer (1981) and Shaffer et al. (1985)
imply that beats belonging to the underlying tactus are timed more accu-
rately than other beats. The second piece of evidence concerns the exis-
tence region of pulse sensation. This region corresponds quite well with
the region of greatest sensitivity in synchronization and durational dis-
crimination tasks. Fraisse (1967) presented listeners with two interpolated
450 Richard Parncutt
pulse trains, each of which was isochronous, and asked them to adjust the
time interval between the two sequences to produce isochrony (at double
the tempo of each separate sequence). In another experiment, listeners
were asked whether the sequence was synchronous or not. Performance
was optimal for periods in the vicinity of 500-600 ms (typical range:
200-900 ms), with discrimination thresholds in the vicinity of 4-5%
(typical range: 2-8%). Similar experiments on duration discrimination
and isochrony were performed by Eisler (1975) and Michon (1964).
Fraisse (1982) concluded that synchronization is possible only within the
Real musical rhythms are not usually cyclic, but change as they unfold in
time. Consequently, the relative saliences of the various pulse sensations
evoked by a musical rhythm also vary with time. The model described above
is limited to cyclically repeating rhythms and so does not account for such
variations. The model may be extended to cover ordinary, noncyclic
rhythms by accounting for primacy and recency effects, according to which
the events near the start of a rhythm and near the observation time have a
greater influence on the salience of pulse sensations (and hence on perceived
meter) than intervening events. Both primacy and recency effects are as-
sumed here to involve the duration of the psychological present.
psychological present in the case of the primacy effect. Note that ^ and if/0
are not necessarily equal. They will each need to be determined experimen-
tally and can be expected to vary as a function of the perceived rate of the
music.
The word rhythm may be used to describe any form of temporal period-
icity observable in the physical universe, from molecular vibrations (pe-
not necessarily evoke pulse sensations (internal clocks) and so are not
necessarily rhythmic in the musical sense. In any case, a definition of
rhythm based on approximations to simple ratios of small integers would
be plagued by the arbitrariness of the concepts of approximation and
simplicity.
Periodic perceptual grouping appears to represent the most appropriate
basis for a definition of musical rhythm. Periodic grouping is typical in
music, but relatively unusual in speech- with the possible exception of
mechanical renditions of poetry. Periodic grouping, or pulse sensation, is
Grant, R. E., & LeCroy, S. Effects of sensory mode input on the performance of rhythmic
perception tasks by mentally retarded subjects, journal of Music Therapy, 1986, 23,
2-9.
Guttman, N., & Julesz, B. Lower limits of auditory periodicity analysis, journal of the
Acoustical Society of America. 1963. 35. 610.
Guttmann, A. Das Tempo und seine Variationsbreite [Tempo and its range of variation].
Zeitschrift fur angewandte Psychologie, 1931, 40, 65.
Handel, S. Perceiving melodic and rhythmic auditory patterns. Journal of Experimental
Psychology, 1974, 103, 922-933.
Handel, S. Tempo in rhythm: Comments on Sidnell. Psychomusicology, 1986, 6, 19-23.
Handel, S., & Buffardi, L. Pattern perception: Integrating information presented in two
modalities. Science, 1968, 162, 1026-1028.
Morton, J., Marcus, S., & Frankish, C. Perceptual centers (P-centers). Psychological Re-
view, 1976, 83, 405-408.
Neisser, U. Cognitive psychology. New York: Appleton-Century-Crofts, 1967.
Nickerson, J. h Intonation or solo and ensemble performance or the same melody. Journal
of the Acoustical Society of America, 1949, 21, 593-595.
Noorden, L. van. Temporal coherence in the perception of tone sequences. Unpublished
doctoral dissertation, Institute for Perception Research, Eindhoven, The Netherlands,
1975.
Palmer, C. Mapping musical thought to musical performance. Journal of Experimental
Psychology: Human Perception & Performance, 1989, 15, 331-346.
Palmer, C, & Krumhansl, C. L. Mental representations for musical meter. Journal of
Experimental Psychology: Human Perception & Performance, 1990, 16, 728-741.
Swain, D. The need for limits in hierarchical theories of music. Music Perception, 1986, 4,
121-148.
Temperley,N. M. Personal tempo and subjective accentuation. Journal of General Psychol-
ogy, 1963, 68, 267-287.
Terhardt, E., Stoll, G., & Seewann, M. Algorithm for extraction of pitch and pitch salience
from complex tonal signals. Journal of the Acoustical Society of America, 1982, 71,
679-688.
Thomassen, M. T. Melodic accent: experiments and a tentative model. Journal of the
Acoustical Society of America, 1982, 71, 1596-1605.
Thompson, W. E, Sundberg, J., Friberg, A., & Frydén, L. The use of rules for expression in
the performance of melodies. Psychology of Music, 1989, 17, 63-82.
Todd, N. P. McA. A model of expressive timing in tonal music. Music Perception, 1985, 3,