0% found this document useful (0 votes)

94 views26 pages

9 Speech Recognition

This document summarizes a lecture on speech processing and speech recognition. It discusses speech recognition applications, the speech recognition process involving signal preprocessing, feature extraction, and matching training templates. It also covers parameters of speech recognition tasks like vocabulary size and fluency, as well as large vocabulary continuous speech recognition and speech recognition architecture. Finally, it introduces potential research areas in natural language processing.

Uploaded by

Getnete degemu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views26 pages

9 Speech Recognition

Uploaded by

Getnete degemu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

Lecture 9 : Speech Processing

Adama Science and Technology University

School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2022)
NLP Applications: Speech Processing

01/02/23 2
Outline

 Speech Processing
 Speech Recognition
 Speech Recognition Process
 Parameters of Speech Recognition Task
 Large-Vocabulary Continuous Speech Recognition
 Speech Recognition Architecture
 Feature Extraction

01/02/23 3
Speech Processing

01/02/23 4
Speech Processing …

 Speech Analysis/Synthesis:
 Speech analysis enables to identify words and analyze audio patterns to
detect emotions and stress in a speaker's voice.
 Speech Synthesis is the artificial production of human speech. The
modern task of speech synthesis, also called text-to-speech or TTS, is to
produce speech (acoustic waveforms) from text input.

 Speech Recognition is the process by which a computer (or a

machine) converts the voice signal into the corresponding text or
command through identification and understanding.
 Speech Coding: is the process of obtaining a compact
representation of voice signals for efficient transmission over
band-limited wired and wireless channels.
01/02/23 5
Speech Recognition

 Application areas:
 Human-computer interaction:
 While many tasks are better solved with visual or pointing interfaces,
speech has the potential to be a better interface than the keyboard for
tasks where full natural language communication is useful, or for which
keyboards are not appropriate.
 This includes hands-busy or eyes-busy applications, such as where the
user has objects to manipulate or equipment to control.
 Telephony:
 Where speech recognition is already used for example in spoken
dialogue systems for entering digits, recognizing “yes” to accept collect
calls, finding out airplane or train information, and call-routing
(“Accounting, please”, “Prof. Regier, please”).
 In some applications, a multimodal interface combining speech and
pointing can be more efficient than a graphical user interface without
01/02/23 speech 6
Speech Recognition…

 Application areas…
 Dictation
Automatic Speech Recognition (ASR) can also be applied to
dictation, that is, transcription of extended monologue by a single
specific speaker.
Dictation is common in fields such as law and is also important as
part of augmentative communication (interaction between computers
and humans with some disability resulting in the inability to type, or
the inability to speak)

01/02/23 7
Speech Recognition…

 The general problem of automatic transcription of speech by any

speaker in any environment is still far from solved.

 The SR process can be attributed to pattern recognition and

matching.
 Speech features can be extracted from the original speech signal,
which should have been pre-processed and analysed, and finally SR
template is constructed.
 During the voice recognition, voice template stored in the system is
to be compared to the characteristics of the input voice signal,
according to certain algorithms and strategies, to identify the
optimal template for matching the inputting voice, and finally to
output recognition results.
01/02/23 8
Speech Recognition Process

 Speech Recognition (SR) process generally involves the following

key modules:
 Signal pre-processing,
 Speech feature extraction,
 Matching training-library template, and
 Outputting the matching results

01/02/23 9
Speech Recognition Process…

 SR process generally involves the following several key

modules:
 Signal pre-processing module for sampling voice signal, removing
noise impact caused by the equipment and the environment, and
involves the selection of speech recognition unit.
 Speech feature-extraction module: extract the acoustic parameters
that reflect the essential characteristics of voice, such as voice
frequency and amplitude.
 The matching module: process calculating voice speed and the
likelihood probability between the input characteristics according
to certain criteria such as word formation rules, grammar rules,
semantic rules, and determines the semantic information of the
inputting voice.
01/02/23 10
Parameters of Speech Recognition
Task
 Vocabulary size:
 Digit Recognition,
 Large Vocabulary.
 How fluent, natural, or conversational the speech is:
 Isolated Word,
 Continuous Speech:
 Read Speech,
 Conversational Speech.

 Channel and Noise

 Accent or Speaker-class Characteristics
01/02/23 11
Parameters of Speech Recognition
Task…
 One dimension of variation in speech recognition tasks is
Vocabulary size:
 Speech recognition is easier if the number of distinct words we
need to recognize is smaller.
 So tasks with a two word vocabulary, like yes versus no detection,
or an eleven word vocabulary, like recognizing sequences of
digits, in what is called the digits task , are relatively easy.

 On the other end, tasks with large vocabularies, like transcribing

human-human telephone conversations, or transcribing broadcast
news, tasks with vocabularies of 64,000 words or more, are much
harder.

01/02/23 12
Parameters of Speech Recognition
Task…
 A second dimension of variation is how fluent, natural, or
conversational the speech is.
 Isolated word recognition, in which each word is surrounded by
some sort of pause, is much easier than recognizing continuous
speech.

 Continuous speech in which words run into each other and have
to be segmented. Continuous speech tasks themselves vary greatly
in difficulty.

01/02/23 13
Parameters of Speech Recognition
Task…
 A third dimension of variation is channel and noise.
 The dictation task (and much laboratory research in speech
recognition) is done with high quality, using head mounted
microphones.

 Head mounted microphones eliminate the distortion that occurs in

a table microphone as the speaker’s head moves around.

 Noise of any kind also makes recognition harder.

 Thus recognizing a speaker dictating in a quiet office is much
easier than recognizing a speaker in a noisy car on the highway
with the window open.
01/02/23 14
Parameters of Speech Recognition
Task…
 A final dimension of variation is accent or speaker-class
characteristics.
 Speech is easier to recognize if the speaker is speaking a standard
dialect, or in general one that matches the data the system was
trained on.

 Recognition is thus harder on foreign accented speech, or speech

of children (unless the system was specifically trained on exactly
these kinds of speech).

01/02/23 15
Parameters of Speech Recognition
Task…
 The table shows the rough percentage of incorrect words (the
word error rate, or WER) from state-of-the-art systems on
different ASR tasks.

01/02/23 16
Parameters of Speech Recognition
Task…
 A final dimension of variation is accent or speaker-class
characteristics…
 Variation due to noise and accent increases the error rates quite
a bit.

 The word error rate on strongly Japanese-accented or Spanish

accented English has been reported to be about 3 to 4 times
higher than for native speakers on the same task.

 Adding automobile noise with a 10dB SNR (signal-to-noise

ratio) can cause error rates to go up by 2 to 4 times.
01/02/23 17
Large-Vocabulary Continuous
Speech Recognition (LVCSR)
 Large vocabulary generally means that the systems have a
vocabulary of roughly 20,000 to 60,000 words.

 Continuous means that the words are run together naturally.

 Algorithms can be speaker independent; that is, they are able to

recognize speech from people whose speech the system has
never been exposed to before.

01/02/23 18
Speech Recognition Architecture…

01/02/23 19
Feature Extraction

 MFCC = mel frequency cepstral coefficients

 DFT = Discrete Fourier Transform
 IDFT = The Cepstrum: Inverse Discrete Fourier Transform
01/02/23 20
Chapter Seven: Potential Research
Areas in NLP

01/02/23 21
Potential Research Areas in NLP

 Research Areas in Natural Language Topics:

 Biomedical text mining,
 Computer assisted reviewing,
 Computer-human dialogue systems,
 Computer vision and NLP,
 Controlled natural language,
 Deep linguistic processing,
 Efficient Information extraction techniques,
 Events and Semantics of time,

01/02/23 22
Potential Research Areas in NLP

 Research Areas in Natural Language Topics:

 Extraction of actionable intelligence from social media,
 Fact recognition and also spatiotemporal anchoring of events,
 Identification and also Text correction,
 Issues includes Natural language understanding and also creation,
 Language resources and also architectures for NLP,
 Machine translation based issues,
 Natural language user interfaces,
 NLP includes artificial intelligence,
 POST problems and also Computational linguistic,
01/02/23 23
Potential Research Areas in NLP

 Research Areas in Natural Language Topics:

 Sentiment analysis and also opinion mining,
 Speech processing also using linguistic rules,
 Text processing chain enhancement applying semantic role
labelling, co-reference resolution and also spatial expressions
recognition,
 Topic modelling in Web data,
 Use of rule based approach or statistical approach,
 Word sense disambiguation.
 Etc….

01/02/23 24
Question & Answer

01/02/23 25
Thank You !!!

01/02/23 26

(Cornelius Lanczos) Linear Differential Operators
No ratings yet
(Cornelius Lanczos) Linear Differential Operators
582 pages
PVA Grade 10 Student Textbook Final Version V20220802 - Compressed
100% (1)
PVA Grade 10 Student Textbook Final Version V20220802 - Compressed
144 pages
Signals and Systems
No ratings yet
Signals and Systems
17 pages
5 FFT
100% (2)
5 FFT
39 pages
Practical No. 1 Q) Write A Scilab Program To Study and Implement Discrete Time Signal and System. A. Unit Step Sequence
No ratings yet
Practical No. 1 Q) Write A Scilab Program To Study and Implement Discrete Time Signal and System. A. Unit Step Sequence
33 pages
Modernism and Post Modernism in Literature
0% (1)
Modernism and Post Modernism in Literature
16 pages
Grade 8-Social Studies Fetena Net 1dc2
100% (4)
Grade 8-Social Studies Fetena Net 1dc2
213 pages
2017 EC Academic Calendar
No ratings yet
2017 EC Academic Calendar
3 pages
Arithmetic Optimization Techniques for Hardware and Software Design 1st Edition Ryan Kastner - Download the ebook now for instant access to all chapters
No ratings yet
Arithmetic Optimization Techniques for Hardware and Software Design 1st Edition Ryan Kastner - Download the ebook now for instant access to all chapters
56 pages
Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Worksheet 3
No ratings yet
Worksheet 3
2 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
Paraphrasing Textual Entailment and Semantic Simil
No ratings yet
Paraphrasing Textual Entailment and Semantic Simil
239 pages
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
No ratings yet
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
12 pages
Principles of Digital Signal Processing 2nbsped 3030963217 9783030963217
0% (1)
Principles of Digital Signal Processing 2nbsped 3030963217 9783030963217
689 pages
Workbook Workbook Workbook Workbook Workbook: Try Yourself Questions
No ratings yet
Workbook Workbook Workbook Workbook Workbook: Try Yourself Questions
17 pages
Birla Institute of Technology & Science, Pilani Course Handout Part A: Content Design
No ratings yet
Birla Institute of Technology & Science, Pilani Course Handout Part A: Content Design
7 pages
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
T1 Intro Speech Processing
No ratings yet
T1 Intro Speech Processing
21 pages
Boosting The Performance of Transformer Architectu
No ratings yet
Boosting The Performance of Transformer Architectu
6 pages
NLP 1.3.1_Speed Recogmnition
No ratings yet
NLP 1.3.1_Speed Recogmnition
20 pages
Part2 - Fourier Transform and LTI System
No ratings yet
Part2 - Fourier Transform and LTI System
15 pages
Understanding Fourier Transform Example in Matlab - Mathematics Stack Exchange
No ratings yet
Understanding Fourier Transform Example in Matlab - Mathematics Stack Exchange
4 pages
Utilizing Semantic Textual Similarity For Clinical Survey Data Feature Selection
No ratings yet
Utilizing Semantic Textual Similarity For Clinical Survey Data Feature Selection
9 pages
mitiku tamirat profile
No ratings yet
mitiku tamirat profile
1 page
4_5861484186887524483
No ratings yet
4_5861484186887524483
1 page
2nd yr maths summer class sechedule.docx (2)
No ratings yet
2nd yr maths summer class sechedule.docx (2)
1 page
Collective Human Opinions in Semantic Textual Simi
No ratings yet
Collective Human Opinions in Semantic Textual Simi
17 pages
Goertzel DFT
No ratings yet
Goertzel DFT
16 pages
2017 2nd and 4th Class Schedule-final (2)
No ratings yet
2017 2nd and 4th Class Schedule-final (2)
2 pages
6-Lecture Six (Chapter Four-Semantic Analysis)
No ratings yet
6-Lecture Six (Chapter Four-Semantic Analysis)
25 pages
(Att) 4647
No ratings yet
(Att) 4647
25 pages
Recent Advances in Natural Language Processing
No ratings yet
Recent Advances in Natural Language Processing
50 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
Applsci 12 09691 v2
No ratings yet
Applsci 12 09691 v2
35 pages
Sampling, Removal of Silence and Noise in Audio Signal PDF
No ratings yet
Sampling, Removal of Silence and Noise in Audio Signal PDF
13 pages
Grade 8-Performing and Visual Arts Pva - Fetena - Net - 9aeb
100% (1)
Grade 8-Performing and Visual Arts Pva - Fetena - Net - 9aeb
115 pages
Example 1: DFT of Sine Waveform: Lecture Topic: Understanding DFT and FFT
No ratings yet
Example 1: DFT of Sine Waveform: Lecture Topic: Understanding DFT and FFT
15 pages
Lecture-5-Z Transform-DG - 1
No ratings yet
Lecture-5-Z Transform-DG - 1
47 pages
ABSTRACT Seminar
No ratings yet
ABSTRACT Seminar
5 pages
Shimaa IsmailSemanticSimilarity
No ratings yet
Shimaa IsmailSemanticSimilarity
11 pages
Understanding SAR ADC
No ratings yet
Understanding SAR ADC
5 pages
Multirate Signal Processing: I. Selesnick EL 713 Lecture Notes
No ratings yet
Multirate Signal Processing: I. Selesnick EL 713 Lecture Notes
32 pages
Handout Cloud, Iot, Ip
No ratings yet
Handout Cloud, Iot, Ip
141 pages
Experiment RC Filters
No ratings yet
Experiment RC Filters
11 pages
NR 410201 Digital Signal Processing
No ratings yet
NR 410201 Digital Signal Processing
8 pages
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
No ratings yet
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
24 pages
Post Modernism: Ar Hena Tiwari Jan-July 2016, GCAD Sonipat
No ratings yet
Post Modernism: Ar Hena Tiwari Jan-July 2016, GCAD Sonipat
18 pages
Kaiwartya 2016
No ratings yet
Kaiwartya 2016
17 pages
5 TH Long Ans
No ratings yet
5 TH Long Ans
31 pages
Speech Recognition
0% (1)
Speech Recognition
27 pages
7-Information Extraction (IE) and Machine Translation (MT)
No ratings yet
7-Information Extraction (IE) and Machine Translation (MT)
46 pages
Lab Manual (PROGRAMMING!!)
No ratings yet
Lab Manual (PROGRAMMING!!)
57 pages
Let2 W
No ratings yet
Let2 W
46 pages
Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau
No ratings yet
Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau
31 pages
K X N X K X N X: 1. Periodicity
No ratings yet
K X N X K X N X: 1. Periodicity
13 pages
Grade 8-Information Technology IT Fetena Net Af43
100% (1)
Grade 8-Information Technology IT Fetena Net Af43
115 pages
SPEECH RECOGNITION SYSTEM
No ratings yet
SPEECH RECOGNITION SYSTEM
5 pages
DSP QB
No ratings yet
DSP QB
4 pages
Audio Noise Removal Via Matlab
No ratings yet
Audio Noise Removal Via Matlab
17 pages
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad: Semester-I
No ratings yet
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad: Semester-I
45 pages
Speech Processing
No ratings yet
Speech Processing
9 pages
A Brief Introduction To Automatic Speech Recognition
No ratings yet
A Brief Introduction To Automatic Speech Recognition
22 pages
Module 1 - Block Diagram and Flowchart
No ratings yet
Module 1 - Block Diagram and Flowchart
24 pages
HDP Work Book Final
100% (2)
HDP Work Book Final
98 pages
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
No ratings yet
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
36 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
A Report On
No ratings yet
A Report On
35 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
A Survey On Speech Recognition
No ratings yet
A Survey On Speech Recognition
2 pages
SPEECH
100% (1)
SPEECH
17 pages
An Introduction To Machine Translation: Andy Way, DCU
No ratings yet
An Introduction To Machine Translation: Andy Way, DCU
23 pages
Poriyaan 12p 6y uZaZ4mGM-4kip3XVNX7YONBwg PDF
No ratings yet
Poriyaan 12p 6y uZaZ4mGM-4kip3XVNX7YONBwg PDF
33 pages
8-Deep Learning For NLP
No ratings yet
8-Deep Learning For NLP
49 pages
Lecture 10 - Text To Speech
No ratings yet
Lecture 10 - Text To Speech
76 pages
CDMA Presentation
No ratings yet
CDMA Presentation
15 pages
MATLAB Audio Processing Ho
No ratings yet
MATLAB Audio Processing Ho
7 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
Grade 8-Career and Technical Education Cte - Fetena - Net - 7a2b
No ratings yet
Grade 8-Career and Technical Education Cte - Fetena - Net - 7a2b
162 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Digital Speech Processing
No ratings yet
Digital Speech Processing
46 pages
Chapter 1-Intro To IA
No ratings yet
Chapter 1-Intro To IA
7 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Filter Design
No ratings yet
Filter Design
9 pages
Published Paper
No ratings yet
Published Paper
12 pages
Power Quality and Harmonic Analysis in Three Phase
100% (1)
Power Quality and Harmonic Analysis in Three Phase
7 pages
Pattern Recognition
No ratings yet
Pattern Recognition
28 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
Assignment For B.Tech ECE: X N X K
No ratings yet
Assignment For B.Tech ECE: X N X K
3 pages
The Final Main Thesis-Compressed
No ratings yet
The Final Main Thesis-Compressed
85 pages
2-Lecture Two - (Back Ground of NLP)
No ratings yet
2-Lecture Two - (Back Ground of NLP)
65 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Post Modernism
No ratings yet
Post Modernism
16 pages
DSP Lab Manual
No ratings yet
DSP Lab Manual
132 pages
Machine Translation Technologies
No ratings yet
Machine Translation Technologies
30 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Engineering Mathematics: Different Unit Initial Transforms &
No ratings yet
Engineering Mathematics: Different Unit Initial Transforms &
8 pages
6.1.5 Journal - The Spy's Notebook - Lists, Lists, and More Lists
No ratings yet
6.1.5 Journal - The Spy's Notebook - Lists, Lists, and More Lists
6 pages
Untitled
No ratings yet
Untitled
61 pages
EE370 Lab Experiment 01
No ratings yet
EE370 Lab Experiment 01
6 pages
Dsplabmanual-By 22
100% (1)
Dsplabmanual-By 22
64 pages
SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model
No ratings yet
SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model
11 pages
advancedDSP 1
No ratings yet
advancedDSP 1
88 pages
FFT VHDL
No ratings yet
FFT VHDL
28 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
Discrete Fourier Transform Lab
No ratings yet
Discrete Fourier Transform Lab
12 pages
Ai in Speech Recognition
No ratings yet
Ai in Speech Recognition
24 pages
Convolution & IDFT Part 6 PDF
No ratings yet
Convolution & IDFT Part 6 PDF
36 pages
Multirate Signal Processing
No ratings yet
Multirate Signal Processing
33 pages
Basic Signal Operations
No ratings yet
Basic Signal Operations
12 pages
Ann LA2 Project
No ratings yet
Ann LA2 Project
23 pages
A Filter Is A System
No ratings yet
A Filter Is A System
10 pages
Ca4 - Designing and Implementing Filters
No ratings yet
Ca4 - Designing and Implementing Filters
9 pages
SPEECH RECOGNITION SYSTEM Final
No ratings yet
SPEECH RECOGNITION SYSTEM Final
16 pages

9 Speech Recognition

Uploaded by

9 Speech Recognition

Uploaded by

Lecture 9 : Speech Processing

Adama Science and Technology University

 Speech Recognition is the process by which a computer (or a

 The general problem of automatic transcription of speech by any

 The SR process can be attributed to pattern recognition and

 Speech Recognition (SR) process generally involves the following

 SR process generally involves the following several key

 Channel and Noise

 On the other end, tasks with large vocabularies, like transcribing

 Head mounted microphones eliminate the distortion that occurs in

 Noise of any kind also makes recognition harder.

 Recognition is thus harder on foreign accented speech, or speech

 The word error rate on strongly Japanese-accented or Spanish

 Adding automobile noise with a 10dB SNR (signal-to-noise

 Continuous means that the words are run together naturally.

 Algorithms can be speaker independent; that is, they are able to

 MFCC = mel frequency cepstral coefficients

 Research Areas in Natural Language Topics:

 Research Areas in Natural Language Topics:

 Research Areas in Natural Language Topics:

You might also like