0% found this document useful (0 votes)
3 views

Speech Processing -Anu

Uploaded by

daliunni21
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Speech Processing -Anu

Uploaded by

daliunni21
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Methods Used

for Spoken Word


Recognition
Faculty : Dr R Rajasudhakar
Presenter : Anu Prasad
Spoken Word Recognition

✔ Psycholinguists define spoken word recognition (SWR) as, the


processes intervening between speech perception
and sentence processing, whereby a sequence of speech
elements is mapped to a phonological word form

✔ Interface between low-level perception and cognitive


processes of retrieval, parsing and interpretation.
Challenges ??????
Background noise, rate of speech ,
dialect , language
Phonetic categories Incoming acoustic signals

Words

Phrases and syntactic


structures

Semantics
• Despite all these problems, adults recognize spoken words correctly.
• In most of the current models includes mapping as follows:
• Recognition of the unit occurs when its activation exceeds either a
threshold or some activation state relative to all other units at its level.
• The simplest way to study spoken word recognition is to study and measure
‘recognisability’ i.e. identification of words in a noisy condition or that of
truncated or filtered speech stimulus.
• However, these tasks fail to provide a measure of reaction time due to
variability.

• Some of the methods used in spoken word recognition research:


• Word Under Noise
• Continuous Speech
• Filtered, Truncated Words
• Tokens Embedded in Words and Non-
Words
• Lexical Decision
• Rhyme Monitoring
• Word Spotting
• Word Monitoring
• Phoneme Triggered Lexical Decision
• Cross-Model Priming
• Speeded Repetition of Words
• FMRI
1. Word under noise (Fururi, 1992)

Key Challenges with Noise and Their Flattened Spectral Envelope:


Effects Overall frequency range becomes less
1. Spectral Variation dynamic, making it harder to differentiate
speech sounds.

Disappearing Spectral Peaks:


Peaks in the frequency spectrum (important for Spurious Spectral Peaks:
distinguishing vowels and consonants) get lost Noise creates artificial peaks that
in noise. confuse speech processing systems.
.
Changes in Spectral Inclination
Nonlinear Transformations: and Bandwidth:
Noise causes unpredictable distortions, making speech Changes in how energy is
less natural distributed across frequencies
alter the speech characteristics.
2. Nonlinear Time Expansion and Contraction:
Noise can distort the timing of speech.
Some parts may sound stretched (expanded) or compressed (contracted).
This creates difficulty in aligning speech with expected patterns.

3. Additive Noise and Speech Period Detection:

Speech periods (time cycles that define pitch or rhythm) are harder to
detect when masked by noise.
Results in challenges for tone analysis or pitch tracking.
The following methods have been used to deal with additive noises:

• Using special microphones


• Reducing and suppressing noise
• Using noise masking and adaptive models
• Using spectral distance measures that are robust against noises
• Compensating for spectral deviations from the special speaking manners
used in noisy environments (Lombard effect).
Using special microphones

• Use directional or noise-canceling microphones.


• These mics focus on capturing the desired sound while rejecting background
noise.
• Example: Microphones with multiple sensors to differentiate between speech and
noise.

Auditory Models for Speech Analysis:

•Mimic how humans process sounds using our ears and brain.
•Identify speech features (pitch, formants, etc.) while ignoring irrelevant noise.
Noise Reduction and Suppression:

•Algorithms that estimate and subtract noise from the speech signal.
•Example: Noise gates or spectral subtraction.

Noise Masking:

•Add specific types of noise (like white noise) to “cover” the unpleasant or interfering noise.
•Used when noise cannot be removed entirely .

Adaptive Models:

•Models that adjust in real-time based on the noise environment.


•Example: Machine learning models that learn the characteristics of the noise and adapt accordingly .
•.

Spectral Distance Measures

•Compare speech features (like frequencies) even when noise is present.


•Robust methods ensure that small deviations caused by noise don’t
misclassify speech

Compensation for the Lombard Effect:

Adaptive techniques adjust speech recognition systems to handle


these variations.
Why Compensate for the Lombard Effect?
Speech recognition systems are often trained on "normal" speech.

The Lombard Effect causes:

✔ Increased Spectral Energy:


More energy in higher frequencies as people emphasize
certain sounds.

✔ Altered Speech Rhythm:


Different timing and stress patterns in speech.

✔ Distorted Acoustic Models:

The system might misclassify words due


to unexpected pitch or intensity changes.
Without compensation, speech systems (e.g., voice assistants)
struggle in noisy settings like a busy street or a crowded café.
2. Truncation/ Filtering of stimulus

What is Gating in Speech Perception?

Gating refers to presenting a speech signal incrementally, truncating it at different


time intervals, to understand how listeners recognize words.

Purpose: To investigate how much speech information (e.g., vowels, consonants) is


needed for accurate word recognition and how context influences recognition.
Study 1: Ellis et al. (1971)

Perception of Electronically Gated Speech

Method: Four similar-sounding words—cap, cat, cab,


cash—were used.

• The vowel sound (/æ/) was gated electronically, so the


duration of the vowel increased progressively across
presentations.
• Stimuli were presented randomly to participants.
Findings:

• Recognition Improves with Longer Vowel Durations:

Participants progressively recognized cap, cat, cab more accurately as the vowel
duration increased.

• Partial Information is Enough for Certain Words:

At about halfway through the vowel, participants could correctly identify cat, cap,
and cab more often than by random chance.

• Cash’ Requires More Information:

For cash, participants needed the final consonant (/ʃ/) to correctly identify it.
Study 2: Grosjean (1980)
Spoken Word Recognition Processes and Gating Paradigm

Method:
•Words of varying lengths and frequencies were presented to participants in three contexts:
• In isolation: No extra context, just the word.
• In short context: Minimal surrounding linguistic context.
• In long context: A sentence or phrase providing substantial contextual
information.

•The presentation of each word was incremental (word duration increased gradually).
•After each increment:
• Participants wrote down their guess of the word.
• Indicated their confidence level in the guess
Findings:

Word Frequency and Length Matter:

High-frequency, short words (common, short words like "cat") were


identified more quickly than low-frequency or longer words.

Context Helps:

Words presented in a long context were recognized much faster


and more confidently than those in isolation or short contexts.

Lexical Access in Online Processing:

The study showed how listeners access and process words


dynamically (online) as more acoustic information becomes
available.
3.Lexical decision

What is a Lexical Decision Task?

•A lexical decision task measures how participants process and recognize words in real-time.

•Task: Participants decide whether a given stimulus is a real word (e.g., "umbrella") or a non-word (e.g.,
"umbrellir")

•Purpose: To study lexical access—how quickly and efficiently the brain retrieves information about words
from the mental lexicon (our internal "dictionary").
1.Online Processing:

•The task measures real-time processing, as participants must decide quickly.


•For auditory stimuli, the full word must be heard before a decision is made (e.g., "umbrellir" becomes a non-
word at the last syllable).

2.Factors Affecting Response Time:

•Word Length:
• Longer words generally take more time to process

•Word Frequency:
• High-frequency words (e.g., "car") are recognized faster than low-frequency words (e.g., "vial").
•Non-Words: Non-words that closely resemble real words (e.g., "umbrellir") take longer
to reject than completely nonsensical ones (e.g., "flobber").

3.Auditory Closure Phenomenon:


•In unfavorable listening environments, participants might "guess" the word based on incomplete cues.
•This reflects the brain’s predictive ability in filling gaps when information is missing or distorted

4.Lexical Access:
Latency (response time) indicates how quickly the brain retrieves a word from the mental lexicon.
Faster RTs suggest easier or more automatic access.
4. Word spotting

What is Word Spotting?

•Word spotting is a task in speech processing where participants or systems identify specific
target words embedded within a continuous stream of speech.

•Unlike full sentence recognition, word spotting focuses only on detecting whether a word exists
in the input.
Why Use Word Spotting?

To study how listeners or machines can identify key words in noisy or complex speech
environments

Helps in understanding:

• Lexical access: How specific words are retrieved from memory.


• Acoustic processing: How listeners or systems distinguish target words from surrounding
sounds.
Stimuli:

Speech samples containing:


• Target words (e.g., “apple”).
• Distractor words or non-speech sounds.

Example: A sentence like “I ate an apple pie” contains the target word “apple.”

Task

Participants or systems are instructed to:


• Detect or mark the presence of the target word.
• Ignore the surrounding speech or noise.

Measurement

▪ Accuracy: Did they correctly identify the target word?


▪ Response Time (RT): How quickly did they spot the word?
▪ False Positives: Were non-targets mistakenly identified as the target?
Findings
Context Matters:
•Target words are easier to spot when they are predictable from context.
•Example: In “I ate an apple pie”, the context of “ate an” primes the word “apple.”

Acoustic Salience:
•Words with distinct acoustic features (e.g., stress, intonation) are easier to spot.
•Example: “APPLE” in a loud, clear tone is easier to detect than “apple” in a monotone speech.

Noise and Overlapping Speech:


•Word spotting becomes harder in noisy environments or when other speech overlaps with the
target.
•Noise masking reduces clarity and introduces false positives.
Lexical Characteristics:

•Frequency: High-frequency words (e.g., “dog”) are detected more easily than low-frequency words (e.g.,
“lichen”)
•Length: Shorter words (e.g., “cat”) are harder to spot than longer words due to potential overlaps with
parts of other words

Speech Rate:
•Faster speech reduces word spotting accuracy.
•Slower speech gives more time for processing and increases accuracy.
McQueen and Cutler (1998): Word Spotting in Contexts

Study Design:
1. Participants were given nonsense speech stimuli containing real words randomly embedded.
2. Words were presented in different contexts:
1. Syllabic context: e.g., "vuffapple" (contains vowels and likely word boundaries).
2. Consonantal context: e.g., "fapple" (contains only consonants and no clear word
boundaries).
Findings:

1. Syllabic Context is Better: Words were easier to spot in longer syllabic contexts (e.g.,
"vuffapple") than shorter consonantal ones (e.g., "fapple").

2. Phonotactic Probability:
1. Detection improves when the structure of the nonsense speech (e.g., its
phonotactics) predicts where a word boundary should occur.
2. Example: "venlip" makes "lip" easier to spot than "veglip," where phonotactic rules
do not suggest a clear boundary.
Phonotactic Probabilities

Definition: Phonotactic rules determine the likelihood of certain sound sequences in a language.
• E.g., In English, "lip" in "venlip" is segmented easily because English phonotactics favors a
syllable break before "lip."
• Conversely, "lip" in "veglip" is harder to segment due to unnatural syllable boundaries.

Impact: When nonsense stimuli align with natural language phonotactics, the embedded word is
recognized more easily.
Similarity Neighbourhoods

Definition: A similarity neighbourhood is a group of words differing by only one sound.


• Example: For the word "tweet," neighbours include "treat," "tweed," "twit," and "sweet.“

Findings from Luce and Pisoni (1998):

Dense Neighbourhoods:

• Words with many similar-sounding neighbours (dense neighbourhoods) are:


• Recognized more slowly.
• Recognized with lower accuracy.
• Example: "Tweet" has many neighbours, making it harder to identify.
.
Sparse Neighbourhoods: Words with few neighbours (sparse neighbourhoods) are:

• Recognized faster.
• Recognized with higher accuracy

Example: "Judge" has fewer neighbours, making it easier to identify.


5. Phoneme triggered lexical decision’ [Blank,
1980]

A phoneme-triggered lexical decision task is a variant of the lexical decision task designed to investigate
how phoneme-level and word-level information interact during sentence processing

The task focuses on lexical access—how quickly and efficiently participants recognize real words based on a
specified phoneme.
Setup:
• Participants are presented with a set of sentences.
• Before hearing each sentence, they are given a target phoneme to listen for (e.g., /k/).

Task:
• Participants must:
• Identify real words beginning with the specified phoneme as they occur in the sentence.
• Ignore nonsense words (even if they contain the target phoneme).
• Example:
• Target phoneme: /k/.
• Sentence: "Bobby drove the car into the lake."
• Participant's Response: Press the button on hearing "car."
Manipulation

• The speed of lexical access is varied by altering the semantic predictability of the
target word.
• Semantically related context: The verb or preceding words strongly suggest the
target word (e.g., "drove the car").
• Semantically unrelated context: No strong association with the target word (e.g.,
"saw the car").
Findings (Blank, 1980)

1.Semantic Predictability Enhances Word Recognition:


1. Participants recognized words faster when they were preceded by semantically related verbs
or contexts compared to neutral or unrelated ones.
2. Example:
1. Related context: “Bobby drove the car…” → Faster response to "car."
2. Unrelated context: “Bobby saw the car…” → Slower response to "car.“

2.Comparison to Phoneme or Word Monitoring:


1. The phoneme-triggered lexical decision task was found to be more suitable for studying
online lexical access because it integrates both phonemic cues and semantic processing.
2. Unlike simple phoneme or word monitoring tasks, this method reflects how listeners actively
process words in real-time sentences.
6. ‘Speeded repetition of words’ (auditory
naming task) by Whalen, 1991

Objective: The study aimed to explore how people perceive, process, and repeat words and non-words
when specific fricatives (/s/ or /sh/) are involved. It focused on the factors influencing naming times and
error rates during an auditory task

Stimuli Used

Words: Real monosyllabic words like mess and mesh.


Non-words: Made-up monosyllabic sequences like ness or nesh.
Parameters Studied

Four main variables were tested:

Fricative: Whether the sound was /s/ (e.g., mess) or /sh/ (e.g., mesh).

Lexicality:

Word: A real word like mess.


Non-word: A meaningless sequence like ness.

Location:
Initial fricative: At the start of the word (e.g., sack).
Final fricative: At the end of the word (e.g., mess).

Changeability:
This describes whether changing the fricative identity changes the item’s lexical status:
Example: Mess → Mesh (both real words).
Example: Ness → Nesh (both non-words).
Experiment Design

•Two Versions of Each Stimulus:

• Original: The natural version (e.g., mess).


• Mismatched: A manipulated version where the fricative remained the same, but other
elements like vocalic quality were altered.

•Task:
• Participants listened to all versions of the stimuli via headphones.
• They repeated what they heard into a microphone.
Results

1.Naming Times (Reaction Times):


1. The time taken to repeat words and non-words was indistinguishable, meaning both
were processed at similar speeds.
2.Error Rates:
1. Non-words had significantly more errors than real words, indicating that lexical status
(word vs. non-word) affects accuracy.
2. This suggests that recognizing real words are easier than processing non-words.

On presenting the mismatched version of stimuli ,subjects perceived the correct form of word
,indicating fricatives are important for the word recognition
7. Continuous Speech (Shadowing)
by Marslen-Wilson, 1985
What is Shadowing?

Definition:
Shadowing is a task where a subject listens to spoken language and repeats it back
immediately, word-for-word, with minimal delay.

Purpose:
The experiment is designed to study speech perception and how the brain
processes and repeats language in real time.
Chistovich's 1960 Findings
Two Types of Shadowers

1.Close Shadowers:
1. Delays: Very short, about 150–200 milliseconds (msec).
2. Characteristics: Speech is slurred and difficult to analyze for accuracy.

Demonstrates rapid and efficient speech perception, where the listener almost immediately
anticipates what they hear.

2.Distant Shadowers:
1. Delays: Longer, between 500–1500 msec.
2. Characteristics: Speech is clear and easy to understand.

Shows slower, more deliberate processing of speech.


Isolated Syllables Experiment:

•Chistovich tried using isolated syllables instead of connected speech.


•Result: The system didn’t work as efficiently, suggesting connected speech is more natural and taps
into the brain’s full potential for speech processing.

Conclusion:

•Close shadowing is a better tool for studying real-time speech perception.


•Limitation: Speech production processes (e.g., articulation errors) can interfere with measuring pure
reaction times (RT).
Marslen-Wilson’s 1985 Study
Marslen-Wilson extended these findings with a larger sample and connected speech instead of
isolated syllables.

Participants:
•65 participants, including men and women

Key Observations:

Close Shadowing Ability:


1. Only 25% of women could accurately shadow connected speech at delays of 250–300
msec.
2. The rest of the women and all the men shadowed at longer latencies (500 msec or
more), qualifying as distant shadowers.
Gender Differences:
1. A subset of women outperformed men in close shadowing, showing faster and more
accurate real-time speech perception.
2. Possible interpretation: Biological or cognitive factors might influence processing speed
and shadowing efficiency.
Types of Anomalous prose used in Shadowing
Experiments
Marslen-Wilson used three types of anomalous prose to study how syntax and semantics
influence speech perception during shadowing tasks:

1.Syntactic Prose:
1. Contains normal syntax (grammatical sentence structure), but is semantically
meaningless.
Example: The blue ideas sleep furiously.

2.Random Word Order Prose:


1. No syntax or semantic structure; words are jumbled randomly.
Example: Ideas sleep blue furiously the.

3.Jabberwocky:
1. Maintains basic syntax, but the words are replaced with nonsense words by modifying
sounds.
2. Inspired by Lewis Carroll’s Jabberwocky poem.
Example: Twas brillig and the slithy toves.
Second Series of Experiments

Purpose: To determine if close and distant shadowers process syntactic and semantic information during
shadowing.

Findings:

•Both close and distant shadowers actively analyze the syntax and semantics of the material while
shadowing.

•Evidence:
• Spontaneous Errors: Their errors were constrained by the syntactic and semantic structure of
the prose, meaning their mistakes weren’t random but followed logical rules.
• Sensitivity to Disruptions: Both groups showed reduced performance when the syntactic or
semantic structure of the material was disrupted.
Third Series of Experiments

Purpose: To explore the difference between close and distant shadowers.

Findings:

Close Shadowers:
• Use on-line (real-time) speech analysis to drive their articulation.
• Begin repeating speech before they are fully aware of the material's meaning, relying on rapid
processing.
• Advantage: Close shadowing provides a more direct reflection of language comprehension,
with minimal interference from later (post-perceptual) processes.

Distant Shadowers:
• Use slower, more deliberate output strategies, requiring greater conscious awareness of the
material.
Vitevitch and Luce’s Shadowing Task
Studies (1998, 1999, 2005)
Research Focus:

They examined how phonotactic probabilities and neighborhood density influence the speed
of shadowing.

1.Phonotactic Probabilities:
1. Likelihood of a sound sequence occurring in a language (e.g., str in "street" is high
probability, whereas fsr is low probability).
2.Neighborhood Density:
1. Refers to how many words sound similar to a given word.
2. Example: Cat has a dense neighborhood (bat, mat, sat), whereas quirk has a sparse
neighborhood.
Interpretation

•Spoken word recognition operates at two distinct levels:


• Lexical Level: Where neighborhood density influences word recognition.
• Sublexical Level: Where phonotactic probabilities influence processing of sounds.

•Phonotactic Information operates at the sublexical level.


•Neighborhood Density influences recognition at the lexical level.
Findings

Non-words:
• Non-words with high phonotactic probabilities (common sound
sequences) and dense neighborhoods were repeated faster than those
with low probabilities and sparse neighborhoods.
Words:
• The opposite was true—words with low phonotactic probabilities and
sparse neighborhoods were repeated faster than those with high
probabilities and dense neighborhoods.
8. Tokens embedded in Words and
Non-Words
Study by Zhang & Samuel (2015): Tokens Embedded in Words and Non-Words

Objective:
•Investigated how listeners process English words containing shorter words embedded within them (e.g.,
ham in hamster).

•Used auditory priming experiments to assess when embedded words become activated under varying
listening conditions.
Experiment 1: Optimal Listening Conditions

•1a: Embedded words presented in isolation (ham).


•1b: Embedded words presented within carrier words (hamster).

Findings:
• Isolated embedded words primed targets (ham → pig) in all conditions.
• In carrier words: Priming occurred only if the embedded word was at the beginning or
comprised a large proportion of the carrier word.
Experiment 2: Duration Change

•2a: Primes with expanded/compressed embedded words (haam or hm).


•2b: Primes with expanded/compressed carrier words (haamster or hmster).

Findings:
• Significant priming for isolated embedded words, even under duration changes.
• No priming when carrier words were compressed or expanded.
Experiment 3: Segment Loss
•Method: Replaced a segment of carrier words with noise (e.g., h_noise_mster).
•Findings: Priming was eliminated, indicating embedded word activation relies on intact speech signals.

Experiment 4: Cognitive Load


•4a: Embedded words presented under cognitive load (ham while multitasking).
•4b: Carrier words presented under cognitive load (hamster while multitasking).

Findings

Priming for embedded words persisted in isolation (ham), but not when embedded in carrier words (hamster).
Overall Findings:

1.Activation Factors:
1. Embedded words are activated if they are at the beginning of the carrier word.
2. Activation is stronger when embedded words constitute a large proportion of the carrier
word.

2.Listening Conditions:
1. Embedded word activation occurs only under optimal conditions (e.g., clear speech,
minimal distortion).
2. Under suboptimal conditions (e.g., noise, duration changes, cognitive load), activation is
impaired, especially in carrier words.
Study by Vroomen & de Gelder (1997): Embedded
Monosyllables
Objective:

•Explored cross-modal priming in Dutch to study when monosyllables embedded within other words or
non-words are activated.
Key Findings:

Context

1.Two-Syllable Words:
1. Example: framboos (strawberry) contains the embedded word boos (angry).
2. Finding: Embedded words (boos) produced significant priming for related words, showing
activation in two-syllable contexts.
2.Monosyllabic Words:
1. Example: swine contains wine.
2. Finding: No priming was found for embedded words (wine) in monosyllabic carriers.
Position Effects:

1. Initial Position:
1. Example: vel (skin) in velk (non-word) or velg (word).
2. Finding: Priming occurred when the carrier was a non-word (velk) but not when it was a
word (velg).
3. Longer words inhibit activation of shorter, embedded words due to lexical
competition.

2. Final Position:
1. Example: wine in swine or twine.
2. Finding: No evidence of embedded word activation.
Conclusion:

1.Lexical Competition:

Embedded word activation is influenced by lexical competition, where longer words inhibit
shorter embedded words in word contexts.

2.Syllable Onset:

Activation is stronger when the embedded word has a matching syllable onset with the
lexical representation.
9. Rhyme Monitoring (Marslen-
Wilson, 1980)
The subjects are presented auditorily with sentences. A cue
word rhyming with the target word is presented.

For e.g. target word- lead, cue word- bread.

• The sentences may or may not be meaningful. The target


word may be in any position in the sentence.
• Indicate by pressing a switch when he/she hears the target
word.
Key Findings

1.Semantic and Syntactic Sensitivity:


1. Rhyme monitoring is influenced by semantic (meaning) and syntactic (sentence
structure) information derived from the cue word (bread).
Subjects rely on the context provided by the sentence and cue to identify the
target word.

2.Early Attribute Matching:


1. Subjects start processing the attributes of the target word (e.g., rhyme and
meaning) before they hear all the words in the sentence.
2. They often anticipate how the target word will end, based on the cue and
sentence context.
3.Lexical Interpretation Dominates:

1. The phonetic properties (sounds) of the spoken words are not


processed independently of their lexical interpretation (word
meaning and identity).
2. This means:
1. Subjects do not rely on just the sound to identify rhymes.
2. They use top-down processing, combining phonetic input
with lexical knowledge.
10. Word Monitoring
Similar to Rhyme Monitoring:

•Instead of a rhyming cue, the cue word is semantically related to the target word.
• Example:
• Target word: lead
• Cue word: metal

•Subjects listen to a sentence and press a switch when they hear the target word.

•Nature of Sentences:
•Sentences can be:
• Meaningful: Contextually coherent.
• Nonsense: Lacking semantic coherence
Word Monitoring Paradigm:

1.What Happens in the Task?

1. Subjects monitor ongoing language (spoken or written) for a pre-designated target


word.

2. Independent Variables:
1. Nature of the target word (e.g., semantically related or unrelated).
2. Position of the target word in the sentence.
3. Context of the sentence (e.g., meaningful or nonsense).

3. Dependent Variables:
1. Response Latency: Time taken to press the switch.
2. Error Rate: Missed or incorrect responses.
3. Brain Imaging Data: Neural activity during the task
Findings

1.Semantic Processing is Rapid:


1. Response time (RT) is faster for meaningful sentences than for nonsense
ones.
2. This suggests that semantic information is accessed early in word recognition
when the sentence provides context.

2.Role of Semantic Context:


1. Semantic context aids in:
1. Narrowing down potential word candidates.
2. Facilitating quicker recognition of the target word.
2. Word recognition involves interaction between semantic properties (meaning)
and the context.
11.Cross-Modal Priming Task
(CMPT)
The Cross-Modal Priming Task (CMPT) is an experimental paradigm
that combines both auditory and visual modalities, which is why it's
referred to as "cross-modal." This task is designed to study how
information from one modality (e.g., hearing) influences the processing
of stimuli in another modality (e.g., seeing).
How CMPT Works

1.Participants' Task:
1. Participants listen to a sentence played to them.
2. Before the sentence finishes, they are shown a visual stimulus
(either a word or a picture) on a screen.
3. The visual stimulus can either be:
1. Related (or identical) to a word they heard in the sentence
earlier (e.g., the word "dog" after hearing "animal").
2. Unrelated to the sentence they heard.
2.Response:
1. As soon as they see the word/picture, they are instructed to press a button as quickly as
possible.
1. For words: They decide whether the word is a valid word or a non-word (lexical
decision task).
2. For pictures: They classify the picture, such as determining if it depicts an animate
or inanimate object (e.g., animacy task).

3.Priming Effect:
2. When the visual stimulus is related (or identical) to a word heard earlier in the sentence,
reaction times (RTs) are faster than when the visual stimulus is unrelated.
Summary and Implications

•Cross-Modal Priming: This task demonstrates how auditory input (e.g., a


sentence) can influence the processing of visual information (e.g., words or
pictures), especially when the two are related.

•Facilitation by Context: The faster reaction times for related stimuli


highlight how semantic context plays a crucial role in word recognition and
processing.
12. Functional MRI
Zhuang et al. (2011). The interaction of lexical semantics and cohort competition in
spoken word recognition: an fMRI study. Journal of Cognitive Neuroscience

Spoken word recognition involves the activation of multiple word candidates on the basis
of the initial speech input—the “cohort”—and selection among these competitors.

Selection may be driven primarily by bottom–up acoustic–phonetic inputs or it may be


modulated by other aspects of lexical representation, such as a wordʼs meaning [Marslen-
Wilson, 1987].

They examined the potential interaction of bottom-up and top-down processes in an fMRI
study by presenting participants with words and pseudowords for lexical decision.
•In words with high competition cohorts, high imageability words generated
stronger activity than low imageability words, indicating a facilitatory role of
imageability in a highly competitive cohort context.

•For words in low competition cohorts, there was no effect of imageability.

•These results support the behavioral data in showing that selection processes
do not rely solely on bottom–up acoustic– phonetic cues but rather that the
semantic properties of candidate words facilitate discrimination between
competitors.
• They found greater activity in the left inferior frontal
gyrus (BA 45, 47) and the right inferior frontal
gyrus (BA 47) with increased cohort competition
• An imageability effect in the left posterior middle
temporal gyrus/angular gyrus (BA 39)
• A significant interaction between imageability and
cohort competition in the left posterior superior
temporal gyrus/ middle temporal gyrus (BA 21, 22).
PREVIOUS YEAR QUESTIONS

Explain the different methods in SWR (15 marks) -2024

Write a note on following methods in spoken word recognition.


a) Cross Model priming. (5)
b) Continuous speech. (5)
c) Lexical decision. (5) 2023

Describe any 5 methods used in spoken word research (10)


2022,2019,2018

Briefly describe the pros and cons of any one method used in SWR
research (5) 2021

Write short notes on phoneme triggered lexical decision task and


rhyme monitoring (10) -2019
What is used to suppress noise in speech processing?

What effect describes changes in speech in noisy


environments?

What increases the difficulty of determining the speech period?


In which task is the stimulus truncated or filtered?

What phenomenon leads participants to guess words in unfavorable


conditions?
Which shadower type repeats speech with a delay of 150–200 ms?

What kind of prose involves nonsense words replacing syntactic


structure?
References
1. Dhahan, D., Magnusan, J S. (2006). Ch.8 spoken word recognition. Handbook of psycholinguistics. (249-283)
Spivey, M. J., Mc Rae, K. & Joanisse, M.F. (2012). Section2. Spoken Word Recognition. The Cambridge Handbook of
Psycholinguistics. 61-75
2.Blank, M. A. (1980). Measuring lexical access during sentence processing. Perception & Psychophysics, 28(1), 1-
8.
3.Strauß, A., Wu, T., McQueen, J. M., Scharenborg, O., & Hintz, F. (2022). The differential roles of lexical and
sublexical processing during spoken-word recognition in clear and in noise. cortex, 151, 70-88.
4.Amit, G., & Sandeep, M. (2003). Spoken word RECOGNITION: LEXICAL vs Sublexical. In Workshop on Spoken
Language Processing.
5.Holcomb, P. J., & Anderson, J. E. (1993). Cross-modal semantic priming: A time-course analysis using event-
related brain potentials. Language and cognitive processes, 8(4), 379-411

1. Zhuang, J., Randall, B., Stamatakis, E. A., Marslen-Wilson, W. D., & Tyler, L. K. (2011). The interaction of
lexical semantics and cohort competition in spoken word recognition: an fMRI study. Journal of Cognitive
Neuroscience, 23(12), 3778-3790.

2. Kerry Kilborn Helen Moss (1996) Word Monitoring, Language and Cognitive Processes, 11:6, 689-694, DOI:
10.1080/016909696387105
Thank you

You might also like