0% found this document useful (0 votes)

41 views

Dynamic Programming and Single Word Recognizers (Part 1)

This document discusses dynamic time warping (DTW) and its application to isolated word speech recognition. It introduces DTW as an algorithm for aligning and comparing sequences that may vary in time or speed. The document then describes how DTW can be used to build a simple isolated word recognizer by collecting reference patterns for words, computing DTW scores between input speech and references, and selecting the word with the best matching score.

Uploaded by

Kumar Bittu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Dynamic Programming and Single Word Recognizers (Part 1)

Uploaded by

Kumar Bittu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Dynamic Programming and Single Word Recognizers (Part 1)

Comparing Complete Utterances Comparing Complete Utterances Endpoint Detection Approaches to Sequence Alignment Alignment of Speech Vectors May Be Non-Bijective Time Warping Distance Measure between two Utterances The Minimal Editing Distance Problem Dynamic Programming The Dynamic Programming Matrix Computing the Minimal Editing Distance Utterance Comparison by Dynamic Time Warping DTW-Steps DTW-Applet Constraints for the DTW-Path Global Constraints for the DTW-Path Java Source-Code for DTW The DTW Searchspace

Dynamic Programming and Single Word Recognizers (Part 2)

DTW with Beam Search The Principles of Bilding Speech Recognizers What are We Trying to Do now ? Isolated Word Recognition with Template Matching

Comparing Complete Utterances

So far: record sound signal (ADC) compute frequency representation quantize/classify vectors We now have: sequence of pattern vectors we want similiarity between two such sequences Obviously: Order of vectors is important:

v s.

Comparing Complete Utterances

Comparing speech vector sequences has to overcome three problems: speaking rate: if the speaker is speaking faster, we get fewer vectors in the same time changing speaking rate purposely: e.g. for disambiguation (said vs. sad) changing speaking rate non-purposely: speaking disfluencies

vs.

so we have to find a way to decide which vector to compare to which impose some constraints (not every vector can be compared to every)

Endpoint Detection
When comparing two recorded utterances there might be: utterances are of different length one or both utterances can be preceeded of followed by a period of (possibly non-voluntartily recorded) silence vs. Also: we might not have any mechanizm to signalize the recognizer when it should listen.
Typical Solution: 2 compute signal power: p[i..j] = k=i..j s[k] , then apply threshold to detect speech

Approaches to Sequence Alignment

First idea: normalize length and make linear alignment.

Linear alignment can handle the problem of different speaking rates. But it can not handle the problem of varying speaking rates during the same utterance.

Alignment of Speech Vectors May Be NonBijective

Task: given: two sequences x1,x2,...,xn and y1,y2,...,ym wanted: alignment relation R (not function), were (i,j) is in R iff xi is aligned with yj.

It is possible that more than one x is aligned to the same y (or vice versa). It is possible that more than an x or a y has no alignment partner at all.

Time Warping
Task: given: two sequences x1,x2,...,xn and y1,y2,...,ym wanted: alignment relation R (not function), were (i,j) is in R iff xi is aligned with yj. We are looking for a common timeaxis:

Distance Measure between two Utterances

For a given path R(i,j), the distance between x and y is the sum of all local distances d(xi,yj). In our example: d(x1,y1) + d(x2,y2) + d(x3,y3) + d(x3,y3) + d(x5,y4) + d(x6,y5) + d(x7,y7) + ... Question: How can we find a path that gives the minimal overall distance?

The Minimal Editing Distance Problem

given: two character sequences (words) x1,x2,...,xn and y1,y2,...,ym wanted: the minimal number (and sequence) of editing steps that are needed to convert x to y The editing cursor starts at x0, an editing step can be one of: delete the character xi under the cursor insert a character xi at the cursor position replace character xi at the cursor position with yj moving the cursor to the next character is no editing step, and we can't go back Example: Convert x = "BAKERY" to y = "BRAKES": One possible solution: B = B, move curser to next character insert character y2 = R A = A, move curser to next character K = K, move curser to next character E = E, move curser to next character replace character x5 = R with character y5 = S delete character x6 = Y (sequence not necessarily unique)

Dynamic Programming
How can we find the minimal editing distance? Greedy algorithm? Always perform the step that is currently the cheapest. If there are more than one cheapest step take any one of them. Obvious: can't guarantee to lead to the optimal solution. Solution: Dynamic Programming (DP) DP is frequently used in operations research, where consecutive decisions depend on each other and whose sequence must lead to optimal results. The key idea of DP is: If we would like to take our system into a state si, and we know the costs c1,...,ck for the optimal ways to get from the start to all states q1,...,qk from which we can go to s, then the optimal way to s goes over the state ql where l = argminj cj

The Dynamic Programming Matrix

For finding the minimal editing distance from x1,x2,...,xn to y1,y2,...,ym we can define an algorithm inductively. Let C(i, j) denote the minimal editing distance from x1,x2,...,xi to y1,y2,...,yj. Then we get: C(0,0) = 0 (no characters no editing) C(i, j) is either (whichever is smallest): C(i-1, j-1) plus the cost for replacing xi with yj or C(i-1, j) plus the cost for deleting xi or C(i, j-1) plus the cost for inserting yj Usually: the cost for deleting or inserting a character is 1 the cost for replacing xi with yj is 0 (if xi = yj) or 1 (else) it might be useful to define other costs for special purposes Eventually: remember for each state (i-1, j-1) which one was the best predecessor (backpointer) to find the sequence of editing steps backtrace the predecessor pointers from final state

Utterance Comparison by Dynamic Time Warping

How can we apply the DP algorithm for the minimal editing distance to the utterance comparison problem? Differences and Questions: What do editing steps correspond to? We "never" really get two identical vectors. We are dealing with continuous and not discrete signals here. Answers: We can delete/insert/substitute vectors. Define cost for deleting/inserting as we wish, define cost for substituting = distance between vectors No two vectors are the same? So what. Continuous signals => so we get contiunous distances (no big deal) The DTW-Algorithm Works like the minimal editing distance algorithm with minor modification: Allow different kinds of steps (different predecessors of a state). Use vector-vector distance measture as cost function.

DTW-Steps
Many different warping steps are possible and have been used. Examples:
symmetric (editing distance) Bakis

Itakura

weighted

General rule is: Cumulative cost of destination = best-of(cumulative cost of source + cost of step + distance in destination)

Constraints for the DTW-Path

Different kinds of constraints: endpoint constraints: we want the path not to skip a part at the beginning or end of utterance monotonicity conditions: we can't go back in time (for neither utterance)

local continuity: no jumps etc.

global path constraints: path should be close to diagonal

slope weighting: we beleive the DTW-path should be somehow "smooth"

Global Constraints for the DTW-Path

<- only one path

Java Source-Code for DTW

public double dtw() { for (int r=0; r<rowN; r++) { accu[0][r] = 9.9e99; accu[1][r] = 9.9e99; } accu[0][0] = distance(0,0); for (int c=1; c<colN; c++) { int curr = (c-1)%2, next = c%2; for (int r=0; r<rowN; r++) { accu[next][r] = 9.9e99; double d = distance(c,r); if (r>1) if (accu[curr][r-2] + d < accu[next][r]) { accu[next][r] = accu[curr][r-2] +d; back[c][r]=r-2; } if (r>0) if (accu[curr][r-1] + d < accu[next][r]) { accu[next][r] = accu[curr][r-1] +d; back[c][r]=r-1; } if (accu[curr][r ] + d < accu[next][r]) { accu[next][r] = accu[curr][r ] +d; back[c][r]=r; } } } return accu[curr][rowN-1]; }

The DTW Searchspace

Already suggested: restrict search space by window around diagonal. Caveats: silence period in one utterance can cause "edgy" path search area becomes too restricted when utterance durations differ a lot

Other reason (besides global path constraints) for restricting search space: Save time: A window that has a constant width, reduces the search effort from O(n2) to O(n) To overcome caveats of "diagonal window" restriction, use: beamsearch.

DTW with Beam Search

Idea: do not consider steps to be possible out of states that have "too high" cumulative distances.

Approaches: "expand" only a fixed number of states per column of DTW-matrix expand only states that have a cumulative distance less than a factor (the beam) times the best distance so far

The Principles of Building Speech Recognizers

The following approach is generally considered to be good research habits: task specification (what must be recognized) data collection split up collected data into train set (used for estimating the recognizers parameters) development set (used for evaluating the recognizer during optimization) evaluation set (used only once and never again to report results) estimate parameters of recognizer with training data evaluate and optimize recognizer with development data run test on evaluation data and report results

What are We Trying to Do now?

We will build a first simple isolated-word recognizer using DTW. The recognizer will record speech and print the score for each of its reference patterns:

build recognizer that can recognize two words w1 and w2 collect training examples (one per word in demo, in real life: a lot more) skip the optimization phase (don't need development set) collect evaluation data (a few examples per word) run tests on evaluation data and report results

Isolated Word Recognition with Template Matching

for each word in the vocabulary, store at least one reference pattern when multiple reference patterns are available, either use all of them or compute an average during recognition record a spoken word perform pattern matching with all stored patterns (or at least with those that can be used in the current context) compute a DTW score for every vocabulary word (when using multiple references, compute one score out of many, e.g. average or max) recognize the word with the best DTW score this approach works only for very small vocabularies and/or for speaker-dependent recognition

Computing the Minimal Editing Distance

Editing Steps: Dynamic Programming Matrix:

In the applet You can see which editing steps were made (yellow lines). The number in a cell shows the actual costs to get to the specific field (insertion and deletion have the cost 1, substitution the cost 0 if the next characters are equal, else 1). At the end, a red line shows the optimal path.

code

Infinite Algebra 2 - Factoring ALL Methods Mixed Review
No ratings yet
Infinite Algebra 2 - Factoring ALL Methods Mixed Review
4 pages
Lime
No ratings yet
Lime
22 pages
Useful Formula For ME5207 - V2
No ratings yet
Useful Formula For ME5207 - V2
3 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
18-IntroNLP II PDF
No ratings yet
18-IntroNLP II PDF
187 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
03 Med
No ratings yet
03 Med
52 pages
2 EditDistance 2022
No ratings yet
2 EditDistance 2022
37 pages
03 Med
No ratings yet
03 Med
35 pages
Chain Matrix Multiply
No ratings yet
Chain Matrix Multiply
17 pages
Speech Processing 15-492/18-492: Speech Recognition Template Matching
No ratings yet
Speech Processing 15-492/18-492: Speech Recognition Template Matching
24 pages
Dynamic Time Warping
No ratings yet
Dynamic Time Warping
5 pages
Speech Recognition: Lecture 11: Advanced Topics
No ratings yet
Speech Recognition: Lecture 11: Advanced Topics
35 pages
DP and Edit Dist
No ratings yet
DP and Edit Dist
30 pages
B505 Lec.10 DynamicProgramming 1
No ratings yet
B505 Lec.10 DynamicProgramming 1
19 pages
Calculating Minimum Edit Distance
0% (1)
Calculating Minimum Edit Distance
52 pages
Defini'on of Minimum Edit Distance
No ratings yet
Defini'on of Minimum Edit Distance
52 pages
DTW Pso 04530541
No ratings yet
DTW Pso 04530541
6 pages
Alignment Algorithm
No ratings yet
Alignment Algorithm
58 pages
Speech Recognition
No ratings yet
Speech Recognition
40 pages
Spelling Correction: Edit Distance: Pawan Goyal
No ratings yet
Spelling Correction: Edit Distance: Pawan Goyal
67 pages
MSP_2025 HW 2
No ratings yet
MSP_2025 HW 2
5 pages
Advanced Dynamic Programming: D.1 Saving Space: Divide and Conquer
No ratings yet
Advanced Dynamic Programming: D.1 Saving Space: Divide and Conquer
18 pages
D TW Algorithm
No ratings yet
D TW Algorithm
15 pages
2 EditDistance 2023
No ratings yet
2 EditDistance 2023
35 pages
ADE 16-17 sol
No ratings yet
ADE 16-17 sol
13 pages
DNA Alignment
No ratings yet
DNA Alignment
76 pages
Solving Dsa Interview Questions
100% (1)
Solving Dsa Interview Questions
32 pages
COB Sequencealignment
No ratings yet
COB Sequencealignment
49 pages
Csci3104 S2018 L7
No ratings yet
Csci3104 S2018 L7
11 pages
8 LCS 19 01 2024
No ratings yet
8 LCS 19 01 2024
17 pages
Dynamic Time Warping (DTW) : J.-S Roger Jang (張智星)
No ratings yet
Dynamic Time Warping (DTW) : J.-S Roger Jang (張智星)
13 pages
String matching
No ratings yet
String matching
66 pages
6 Template Matching
No ratings yet
6 Template Matching
25 pages
CS253 Report 3 Wilhelm Aaron
No ratings yet
CS253 Report 3 Wilhelm Aaron
35 pages
Optimal Match Time Series Non-Linearly Sequence Alignment: Vector Quantization (VQ) Is A Classical
No ratings yet
Optimal Match Time Series Non-Linearly Sequence Alignment: Vector Quantization (VQ) Is A Classical
4 pages
CPT212-04b-ComputationalComplexity
No ratings yet
CPT212-04b-ComputationalComplexity
38 pages
EditDistance
No ratings yet
EditDistance
28 pages
DL_UNIT_V_NLP_Application (1)
No ratings yet
DL_UNIT_V_NLP_Application (1)
83 pages
Routing 1
No ratings yet
Routing 1
24 pages
University Of Campinas Notebook
No ratings yet
University Of Campinas Notebook
17 pages
Edit Dist
No ratings yet
Edit Dist
24 pages
L3 Edit Distance
No ratings yet
L3 Edit Distance
23 pages
Edit Distance
No ratings yet
Edit Distance
19 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
Dynamic Time Warping Algorithm Review PDF
No ratings yet
Dynamic Time Warping Algorithm Review PDF
23 pages
Edit Dist
No ratings yet
Edit Dist
35 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
01 Defining Minimum Edit Distance 7-04
No ratings yet
01 Defining Minimum Edit Distance 7-04
3 pages
Dynamic programming 4
No ratings yet
Dynamic programming 4
107 pages
String Edit PDF
No ratings yet
String Edit PDF
39 pages
Routing 1
No ratings yet
Routing 1
24 pages
15.082J & 6.855J & ESD.78J September 23, 2010: Dijkstra's Algorithm For The Shortest Path Problem
No ratings yet
15.082J & 6.855J & ESD.78J September 23, 2010: Dijkstra's Algorithm For The Shortest Path Problem
33 pages
The Big Book of Machine Learning Use Cases
No ratings yet
The Big Book of Machine Learning Use Cases
78 pages
Lec10 12 Edit Distance
No ratings yet
Lec10 12 Edit Distance
54 pages
4-Tolerant retrieval
No ratings yet
4-Tolerant retrieval
82 pages
Lecture 2
No ratings yet
Lecture 2
71 pages
04 Weighted Minimum Edit Distance 2-47
No ratings yet
04 Weighted Minimum Edit Distance 2-47
2 pages
Scaling Up Dynamic Time Warping For Datamining Applications: Eamonn J. Keogh Michael J. Pazzani
No ratings yet
Scaling Up Dynamic Time Warping For Datamining Applications: Eamonn J. Keogh Michael J. Pazzani
5 pages
Big Book of Machine Learning Use Cases v4 120621
No ratings yet
Big Book of Machine Learning Use Cases v4 120621
78 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Admin WI 2020 Set2
No ratings yet
Admin WI 2020 Set2
11 pages
Admin WI 2020 Set2
No ratings yet
Admin WI 2020 Set2
11 pages
Platform Developer 1 - Practice Exam 4 - Test
No ratings yet
Platform Developer 1 - Practice Exam 4 - Test
28 pages
S CC CSC C CC CCC C CCC CCCC CC C C C CC CCC CCC CC CC CCCSCCC
No ratings yet
S CC CSC C CC CCC C CCC CCCC CC C C C CC CCC CCC CC CC CCCSCCC
5 pages
A I Bluetooth Technology: N Ntroduction TO
No ratings yet
A I Bluetooth Technology: N Ntroduction TO
14 pages
Fuels and Lubricants Training
No ratings yet
Fuels and Lubricants Training
98 pages
Agiecharmilles Specificati Masina
0% (1)
Agiecharmilles Specificati Masina
20 pages
Assignment 02A
No ratings yet
Assignment 02A
3 pages
Arrb Walking Profiler WP-G3
No ratings yet
Arrb Walking Profiler WP-G3
2 pages
Rectangular Problems PDF
No ratings yet
Rectangular Problems PDF
10 pages
DNV Rules For Rudder
100% (1)
DNV Rules For Rudder
10 pages
Transistor: History Importance Simplified Operation
No ratings yet
Transistor: History Importance Simplified Operation
18 pages
Amazing Traces of A Babylonian Origins in Greek Math
100% (1)
Amazing Traces of A Babylonian Origins in Greek Math
497 pages
BA101 ENGINEERING MATHEMATIC Chapter 3 Trigonometry
No ratings yet
BA101 ENGINEERING MATHEMATIC Chapter 3 Trigonometry
41 pages
Data Analysis Book
No ratings yet
Data Analysis Book
88 pages
Blood Bank Refrigerator LBBR
No ratings yet
Blood Bank Refrigerator LBBR
4 pages
Mod Lib Model Type
No ratings yet
Mod Lib Model Type
19 pages
Maneuvre of Vessel
100% (2)
Maneuvre of Vessel
70 pages
SANTAFE
100% (1)
SANTAFE
270 pages
Frequently Asked Questions - Networks
No ratings yet
Frequently Asked Questions - Networks
2 pages
Quantifiers 1
No ratings yet
Quantifiers 1
3 pages
JazzWoodshed1 50
No ratings yet
JazzWoodshed1 50
51 pages
RTD and Thermocouple Transmitters
No ratings yet
RTD and Thermocouple Transmitters
10 pages
Sony str-k760p
No ratings yet
Sony str-k760p
44 pages
Kim 1997
No ratings yet
Kim 1997
23 pages
pm102 PDF
100% (1)
pm102 PDF
12 pages
Analysis of Singly Reinforced Concrete Beam PDF
No ratings yet
Analysis of Singly Reinforced Concrete Beam PDF
3 pages
120 Diez
No ratings yet
120 Diez
6 pages
Personal Area Network (PAN)
No ratings yet
Personal Area Network (PAN)
3 pages
Nominal Dimensions Section Properties, Static Data
No ratings yet
Nominal Dimensions Section Properties, Static Data
9 pages
Itanium C Abi
No ratings yet
Itanium C Abi
5 pages
Pre Compilation
No ratings yet
Pre Compilation
6 pages

Dynamic Programming and Single Word Recognizers (Part 1)

Uploaded by

Dynamic Programming and Single Word Recognizers (Part 1)

Uploaded by

Dynamic Programming and Single Word Recognizers (Part 1)

Dynamic Programming and Single Word Recognizers (Part 2)

Comparing Complete Utterances

Comparing Complete Utterances

Approaches to Sequence Alignment

Alignment of Speech Vectors May Be NonBijective

Distance Measure between two Utterances

The Minimal Editing Distance Problem

The Dynamic Programming Matrix

Utterance Comparison by Dynamic Time Warping

Constraints for the DTW-Path

local continuity: no jumps etc.

global path constraints: path should be close to diagonal

slope weighting: we beleive the DTW-path should be somehow "smooth"

Global Constraints for the DTW-Path

<- only one path

Java Source-Code for DTW

The DTW Searchspace

DTW with Beam Search

The Principles of Building Speech Recognizers

What are We Trying to Do now?

Isolated Word Recognition with Template Matching

Computing the Minimal Editing Distance

You might also like