0% found this document useful (0 votes)
94 views26 pages

9 Speech Recognition

This document summarizes a lecture on speech processing and speech recognition. It discusses speech recognition applications, the speech recognition process involving signal preprocessing, feature extraction, and matching training templates. It also covers parameters of speech recognition tasks like vocabulary size and fluency, as well as large vocabulary continuous speech recognition and speech recognition architecture. Finally, it introduces potential research areas in natural language processing.

Uploaded by

Getnete degemu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views26 pages

9 Speech Recognition

This document summarizes a lecture on speech processing and speech recognition. It discusses speech recognition applications, the speech recognition process involving signal preprocessing, feature extraction, and matching training templates. It also covers parameters of speech recognition tasks like vocabulary size and fluency, as well as large vocabulary continuous speech recognition and speech recognition architecture. Finally, it introduces potential research areas in natural language processing.

Uploaded by

Getnete degemu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Lecture 9 : Speech Processing

Adama Science and Technology University


School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2022)
NLP Applications: Speech Processing

01/02/23 2
Outline

 Speech Processing
 Speech Recognition
 Speech Recognition Process
 Parameters of Speech Recognition Task
 Large-Vocabulary Continuous Speech Recognition
 Speech Recognition Architecture
 Feature Extraction

01/02/23 3
Speech Processing

01/02/23 4
Speech Processing …

 Speech Analysis/Synthesis:
 Speech analysis enables to identify words and analyze audio patterns to
detect emotions and stress in a speaker's voice.
 Speech Synthesis is the artificial production of human speech. The
modern task of speech synthesis, also called text-to-speech or TTS, is to
produce speech (acoustic waveforms) from text input.

 Speech Recognition is the process by which a computer (or a


machine) converts the voice signal into the corresponding text or
command through identification and understanding.
 Speech Coding: is the process of obtaining a compact
representation of voice signals for efficient transmission over
band-limited wired and wireless channels.
01/02/23 5
Speech Recognition

 Application areas:
 Human-computer interaction:
 While many tasks are better solved with visual or pointing interfaces,
speech has the potential to be a better interface than the keyboard for
tasks where full natural language communication is useful, or for which
keyboards are not appropriate.
 This includes hands-busy or eyes-busy applications, such as where the
user has objects to manipulate or equipment to control.
 Telephony:
 Where speech recognition is already used for example in spoken
dialogue systems for entering digits, recognizing “yes” to accept collect
calls, finding out airplane or train information, and call-routing
(“Accounting, please”, “Prof. Regier, please”).
 In some applications, a multimodal interface combining speech and
pointing can be more efficient than a graphical user interface without
01/02/23 speech 6
Speech Recognition…

 Application areas…
 Dictation
Automatic Speech Recognition (ASR) can also be applied to
dictation, that is, transcription of extended monologue by a single
specific speaker.
Dictation is common in fields such as law and is also important as
part of augmentative communication (interaction between computers
and humans with some disability resulting in the inability to type, or
the inability to speak)

01/02/23 7
Speech Recognition…

 The general problem of automatic transcription of speech by any


speaker in any environment is still far from solved.

 The SR process can be attributed to pattern recognition and


matching.
 Speech features can be extracted from the original speech signal,
which should have been pre-processed and analysed, and finally SR
template is constructed.
 During the voice recognition, voice template stored in the system is
to be compared to the characteristics of the input voice signal,
according to certain algorithms and strategies, to identify the
optimal template for matching the inputting voice, and finally to
output recognition results.
01/02/23 8
Speech Recognition Process

 Speech Recognition (SR) process generally involves the following


key modules:
 Signal pre-processing,
 Speech feature extraction,
 Matching training-library template, and
 Outputting the matching results

01/02/23 9
Speech Recognition Process…

 SR process generally involves the following several key


modules:
 Signal pre-processing module for sampling voice signal, removing
noise impact caused by the equipment and the environment, and
involves the selection of speech recognition unit.
 Speech feature-extraction module: extract the acoustic parameters
that reflect the essential characteristics of voice, such as voice
frequency and amplitude.
 The matching module: process calculating voice speed and the
likelihood probability between the input characteristics according
to certain criteria such as word formation rules, grammar rules,
semantic rules, and determines the semantic information of the
inputting voice.
01/02/23 10
Parameters of Speech Recognition
Task
 Vocabulary size:
 Digit Recognition,
 Large Vocabulary.
 How fluent, natural, or conversational the speech is:
 Isolated Word,
 Continuous Speech:
 Read Speech,
 Conversational Speech.

 Channel and Noise


 Accent or Speaker-class Characteristics
01/02/23 11
Parameters of Speech Recognition
Task…
 One dimension of variation in speech recognition tasks is
Vocabulary size:
 Speech recognition is easier if the number of distinct words we
need to recognize is smaller.
 So tasks with a two word vocabulary, like yes versus no detection,
or an eleven word vocabulary, like recognizing sequences of
digits, in what is called the digits task , are relatively easy.

 On the other end, tasks with large vocabularies, like transcribing


human-human telephone conversations, or transcribing broadcast
news, tasks with vocabularies of 64,000 words or more, are much
harder.

01/02/23 12
Parameters of Speech Recognition
Task…
 A second dimension of variation is how fluent, natural, or
conversational the speech is.
 Isolated word recognition, in which each word is surrounded by
some sort of pause, is much easier than recognizing continuous
speech.

 Continuous speech in which words run into each other and have
to be segmented. Continuous speech tasks themselves vary greatly
in difficulty.

01/02/23 13
Parameters of Speech Recognition
Task…
 A third dimension of variation is channel and noise.
 The dictation task (and much laboratory research in speech
recognition) is done with high quality, using head mounted
microphones.

 Head mounted microphones eliminate the distortion that occurs in


a table microphone as the speaker’s head moves around.

 Noise of any kind also makes recognition harder.


 Thus recognizing a speaker dictating in a quiet office is much
easier than recognizing a speaker in a noisy car on the highway
with the window open.
01/02/23 14
Parameters of Speech Recognition
Task…
 A final dimension of variation is accent or speaker-class
characteristics.
 Speech is easier to recognize if the speaker is speaking a standard
dialect, or in general one that matches the data the system was
trained on.

 Recognition is thus harder on foreign accented speech, or speech


of children (unless the system was specifically trained on exactly
these kinds of speech).

01/02/23 15
Parameters of Speech Recognition
Task…
 The table shows the rough percentage of incorrect words (the
word error rate, or WER) from state-of-the-art systems on
different ASR tasks.

01/02/23 16
Parameters of Speech Recognition
Task…
 A final dimension of variation is accent or speaker-class
characteristics…
 Variation due to noise and accent increases the error rates quite
a bit.

 The word error rate on strongly Japanese-accented or Spanish


accented English has been reported to be about 3 to 4 times
higher than for native speakers on the same task.

 Adding automobile noise with a 10dB SNR (signal-to-noise


ratio) can cause error rates to go up by 2 to 4 times.
01/02/23 17
Large-Vocabulary Continuous
Speech Recognition (LVCSR)
 Large vocabulary generally means that the systems have a
vocabulary of roughly 20,000 to 60,000 words.

 Continuous means that the words are run together naturally.

 Algorithms can be speaker independent; that is, they are able to


recognize speech from people whose speech the system has
never been exposed to before.

01/02/23 18
Speech Recognition Architecture…

01/02/23 19
Feature Extraction

 MFCC = mel frequency cepstral coefficients


 DFT = Discrete Fourier Transform
 IDFT = The Cepstrum: Inverse Discrete Fourier Transform
01/02/23 20
Chapter Seven: Potential Research
Areas in NLP

01/02/23 21
Potential Research Areas in NLP

 Research Areas in Natural Language Topics:


 Biomedical text mining,
 Computer assisted reviewing,
 Computer-human dialogue systems,
 Computer vision and NLP,
 Controlled natural language,
 Deep linguistic processing,
 Efficient Information extraction techniques,
 Events and Semantics of time,

01/02/23 22
Potential Research Areas in NLP

 Research Areas in Natural Language Topics:


 Extraction of actionable intelligence from social media,
 Fact recognition and also spatiotemporal anchoring of events,
 Identification and also Text correction,
 Issues includes Natural language understanding and also creation,
 Language resources and also architectures for NLP,
 Machine translation based issues,
 Natural language user interfaces,
 NLP includes artificial intelligence,
 POST problems and also Computational linguistic,
01/02/23 23
Potential Research Areas in NLP

 Research Areas in Natural Language Topics:


 Sentiment analysis and also opinion mining,
 Speech processing also using linguistic rules,
 Text processing chain enhancement applying semantic role
labelling, co-reference resolution and also spatial expressions
recognition,
 Topic modelling in Web data,
 Use of rule based approach or statistical approach,
 Word sense disambiguation.
 Etc….

01/02/23 24
Question & Answer

01/02/23 25
Thank You !!!

01/02/23 26

You might also like