100% found this document useful (2 votes)

2K views

Speech Signal Processing

Based on Kerala University M-Tech 1st Semester Speech Signal Processing of Signal Processing Branch.

Uploaded by

Lizy Abraham

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

2K views

Speech Signal Processing

Based on Kerala University M-Tech 1st Semester Speech Signal Processing of Signal Processing Branch.

Uploaded by

Lizy Abraham

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 173

SPEECH SIGNAL PROCESSING MKERALA UNIVERSITY M-TECH 1ST SEMESTER

[email protected] +919495123331 Lizy Abraham Assistant Professor Department of ECE LBS Institute of Technology for Women (A Govt. of Kerala Undertaking) Poojappura Trivandrum -695012 Kerala, India
1

SYLLABUS TSC 1004 SPEECH SIGNAL PROCESSING 3-0-0-3 3-

Speech Production :- Acoustic theory of speech production (Excitation, Vocal tract model for speech analysis, Formant structure, Pitch). Articulatory Phonetic (Articulation, Voicing, Articulatory model). Acoustic Phonetics ( Basic speech units and their classification). Speech Analysis :- Short-Time Speech Analysis, Time domain analysis (Short time energy, short time zero crossing Rate, ACF ). Frequency domain analysis (Filter Banks, STFT, Spectrogram, Formant Estimation &Analysis). Cepstral Analysis Parametric representation of speech :- AR Model, ARMA model. LPC Analysis ( LPC model, Auto correlation method, Covariance method, Levinson-Durbin Algorithm, Lattice form).LSF, LAR, MFCC, Sinusoidal Model, GMM, HMM Speech coding :- Phase Vocoder, LPC, Sub-band coding, Adaptive Transform Coding , Harmonic Coding, Vector Quantization based Coders, CELP Speech processing :- Fundamentals of Speech recognition, Speech segmentation. Text-tospeech conversion, speech enhancement, Speaker Verification, Language Identification, Issues of Voice transmission over Internet.

REFERENCE
1. Douglas O'Shaughnessy, Speech Communications : Human & Machine, IEEE Press, Hardcover 2nd edition, 1999; ISBN: 0780334493. 2. Nelson Morgan and Ben Gold, Speech and Audio Signal Processing : Processing and Perception Speech and Music, July 1999, John Wiley & Sons, ISBN:0471351547 3. Rabiner and Schafer, Digital Processing of Speech Signals, Prentice Hall, 1978. 4. Rabiner and Juang, Fundamentals of Speech Recognition, Prentice Hall, 1994. 5. Thomas F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice Hall; ISBN: 013242942X; 1st edition 6. Donald G. Childers, Speech Processing and Synthesis Toolboxes, John Wiley & Sons, September 1999; ISBN: 0471349593 For the End semester exam (100 marks), the question paper shall have six questions of 20 marks each covering entire syllabus out of which any five shall be answered. It shall have 75% problems & 25% Theory. For the internal marks of 50, Two test of 20 marks each and 10 marks for assignments (Minimum two) /Term Project.
3

Speech Processing means Processing of discrete time speech signals

Algorithms (Programming)

Psychoacoustics Room acoustics Speech production

Speech Processing Signal Processing

Acoustics

Information Theory

Phonetics

Fourier transforms Discrete time filters AR(MA) models

Statistical SP Stochastic models

Entropy Communication theory Rate-distortion theory

HOW IS SPEECH PRODUCED ?

Speech can be defined as a pressure acoustic signal that is articulated in the vocal tract

Speech is produced when: air is forced from the lungs through the vocal cords and along the vocal tract.
8

This air flow is referred to as excitation signal. This excitation signal causes the vocal cords to vibrate and propagate the energy to excite the oral and nasal openings, which play a major role in shaping the sound produced. Vocal Tract components: Oral Tract: (from lips to vocal cords). nostrills). Nasal Tract: (from the velum till nostrills).
9

Larynx: the source of speech Vocal cords (folds): the two folds of tissue in the larynx. They can open and shut like a pair of fans. Glottis: the gap between the vocal cords. As air is forced through the glottis the vocal cords will start to vibrate and modulate the air flow. The frequency of vibration determines the pitch of the voice (for a male, 50-200Hz; for a female, up to 500Hz).
12

SPEECH PRODUCTION MODEL

Places of articulation

dental labial

alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal

Classes of speech sounds Voiced sound

The vocal cords vibrate open and close Quasi-periodic pulses of air The rate of the opening and closing the pitch

Unvoiced sounds
Forcing air at high velocities through a constriction Noise-like turbulence Show little long-term periodicity Short-term correlations still present Eg. S, F

Plosive sounds
A complete closure in the vocal tract Air pressure is built up and released suddenly Eg. B , P
15

Speech Model

SPEECH SOUNDS
Coarse classification with phonemes. A phone is the acoustic realization of a phoneme. Allophones are context dependent phonemes.
17

PHONEME HIERARCHY
Speech sounds Language dependent. About 50 in English. Consonants

Vowels iy, ih, ae, aa, ah, ao,ax, eh, er, ow, uh, uw

Diphtongs ay, ey, oy, aw Glide w, y Plosive p, b, t, d, k, g

Lateral liquid Retroflex liquid Nasal m, n, ng Fricative f, v, th, dh, s, z, sh, zh, h r l

sounds like /SH/ and /S/ look like (spectrally shaped) random noise, while the vowel sounds /UH/, /IY/, and /EY/ are highly structured and quasi-periodic. These differences result from the distinctively different ways that these sounds are produced.

Vowel Chart Front i High Center Back u

e Mid

Low

SPEECH WAVEFORM CHARACTERISTICS

Loudness Voiced/Unvoiced. Pitch.

Fundamental frequency.

Spectral envelope.
Formants.

Acoustic Characteristics of speech Pitch:

Signal within each voiced interval is periodic. The period T is called pitch. The pitch depends on the vowel being spoken, changes in time. T~70 samples in this ex. f0=1/T is the fundamental frequency (also known as formant frequency).

FORMANTS
Formants can be recognized in the frequency content of the signal segment.
Formants are best described as high energy peaks in the frequency spectrum of speech sound.

The resonant frequencies of the vocal tract are called formant frequencies or simply formants. The peaks of the spectrum of the vocal tract response correspond approximately to its formants. Under the linear time-invariant all-pole assumption, each vocal tract shape is characterized by a collection of formants.
28

Because the vocal tract is assumed stable with poles inside the unit circle, the vocal tract transfer function can be expressed either in product or partial fraction expansion form:

A detailed acoustic theory must consider the effects of the following: Time variation of the vocal tract shape Losses due to heat conduction and viscous friction at the vocal tract walls Softness of the vocal tract walls Radiation of sound at the lips Nasal coupling Excitation of sound in the vocal tract Let us begin by considering a simple case of a lossless tube:

28 December 2012

MULTI-TUBE APPROXIMATION OF THE VOCAL TRACT

We can represent the vocal tract as a concatenation of N lossless tubes with area {Ak}.and equal length x = l/N The wave propagation time through each tube is =x/c = l/Nc

Consider an N-tube model of the previous figure. Each tube has length lk and cross sectional area of Ak. Assume:
No losses Planar wave propagation

The wave equations for section k: 0xlk

28 December 2012

SOUND PROPAGATION IN THE CONCATENATED TUBE MODEL

Boundary conditions: Physical principle of continuity:
Pressure and volume velocity must be continuous both in time and in space everywhere in the system:

At kth/(k+1)st junction we have:

28 December 2012

ANALOGY WITH ELECTRICAL CIRCUIT TRANSMISSION LINE

28 December 2012

PROPAGATION OF SOUND IN A UNIFORM TUBE

The vocal tract transfer function of volume velocities is

28 December 2012

PROPAGATION OF SOUND IN A UNIFORM TUBE

Using the boundary conditions U (0,s)=UG(s) and P(-l,s)=0

*(derivation in Quateri text: page 122 125) The poles of the transfer function T (j ) are where cos( l/c)=0

119 124: Quatieri Derivation of eqn.4.18 is important.

28 December 2012

PROPAGATION OF SOUND IN A UNIFORM TUBE (CONT)

For c =34,000 cm/sec, l =17 cm, the natural frequencies (also called the formants) are at 500 Hz, 1500 Hz, 2500 Hz,

The transfer function of a tube with no side branches, excited at one end and response measured at another, only has poles The formant frequencies will have finite bandwidth when vocal tract losses are considered (e.g., radiation, walls, viscosity, heat) The length of the vocal tract, l, corresponds to 1/41, 3/42, 5/43, , where i is the wavelength of the ith natural frequency

28 December 2012

UNIFORM TUBE MODEL

Example
Consider a uniform tube of length l=35 cm. If speed of sound is 350 m/s calculate its resonances in Hz. Compare its resonances with a tube of length l = 17.5 cm. c f=/2 =k , k = 1,3,5,... 2 l c 1 350 f= =k =k = 250k 2 2 l 2 4 0.35 f = 250,750,1250,...
41

28 December 2012

UNIFORM TUBE MODEL

For 17.5 cm tube:

c 1 350 f= =k =k = 250k 2 2 l 2 4 0.175 f = 500,1500,2500,...

APPROXIMATING VOCAL TRACT SHAPES

VOWELS
Modeled as a tube closed at one end and open at the other the closure is a membrane with a slit in it the tube has uniform cross sectional area membrane represents the source of energy (vocal folds) the energy travels through the tube the tube generates no energy on its own the tube represents an important class of resonators odd quarter length relationship Fn=(2n-1)c/4l

VOWELS
Filter characteristics for vowels the vocal tract is a dynamic filter it is frequency dependent it has, theoretically, an infinite number of resonances each resonance has a center frequency, an amplitude and a bandwidth for speech, these resonances are called formants formants are numbered in succession from the lowest F1, F2, F3, etc.

Fricatives Modeled as a tube with a very severe constriction The air exiting the constriction is turbulent Because of the turbulence, there is no periodicity unless accompanied by voicing

When a fricative constriction is tapered

the back cavity is involved this resembles a tube closed at both ends
Fn=nc/2l

such a situation occurs primarily for articulation disorders

Introduction to Digital Speech Processing (Rabiner & Schafer ) 20-23

Rabiner & Schafer : 98105

28 December 2012

SOUND SOURCE: VOCAL FOLD VIBRATION

Modeled as a volume velocity source at glottis, UG(j )

SHORT-TIME SPEECH ANALYSIS

Segments (or frames, or vectors) are typically of length 20 ms.
Speech characteristics are constant. Allows for relatively simple modeling.

Often overlapping segments are extracted.

SHORTSHORT-TIME ANALYSIS OF SPEECH

the system is an all-pole system with system function of the form:

For all-pole linear systems, the input and output are related by a difference equation of the form:

The operator T{ } defines the nature of the short-time analysis function, and w[n m] represents a time shifted window sequence

SHORT-TIME ENERGY
simple to compute, and useful for estimating properties of the excitation function in the model.

In this case the operator T{ } is simply squaring the windowed samples.

SHORT-TIME ZERO-CROSSING RATE

Weighted average of the number of times the speech signal changes sign within the time window. Representing this operator in terms of linear filtering leads to:

Since |sgn{x[m]} sgn{x[m 1]}| is equal to 1 if x[m] and x[m 1] have different algebraic signs and 0 if they have the same sign, it follows that it is a weighted sum of all the instances of alternating sign (zero-crossing) that fall within the support region of the shifted window w[n m].

shows an example of the short-time energy and zero crossing rate for a segment of speech with a transition from unvoiced to voiced speech. In both cases, the window is a Hamming window of duration 25ms (equivalent to 401 samples at a 16 kHz sampling rate). Thus, both the short-time energy and the short-time zero-crossing rate are output of a low pass filter whose frequency response is as shown.

Short time energy and zero-crossing rate functions are slowly varying compared to the time variations of the speech signal, and therefore, they can be sampled at a much lower rate than that of the original speech signal. For finite-length windows like the Hamming window, this reduction of the sampling rate is accomplished by moving the window position n in jumps of more than one sample

during the unvoiced interval, the zero-crossing rate is relatively high compared to the zerocrossing rate in the voiced interval. Conversely, the energy is relatively low in the unvoiced region compared to the energy in the voiced region.

SHORT-TIME AUTOCORRELATION FUNCTION (STACF)

The autocorrelation function is often used as a means of detecting periodicity in signals, and it is also the basis for many spectrum analysis methods. STACF is defined as the deterministic autocorrelation function of the sequence xn[m] = x[m]w[n m] that is selected by the window shifted to time n, i.e.,

e[n] is the excitation to the linear system with impulse response h[n]. A well known, and easily proved, property of the autocorrelation function is that

i.e., the autocorrelation function of s[n] = e[n] h[n] is the convolution of the autocorrelation functions of e[n] and h[n].

SHORT-TIME FOURIER TRANSFORM (STFT)

The expression for the discrete-time STFT at time n
where w[n] is assumed to be non-zero only in the interval [0, N w - 1] and is referred to as analysis window or sometimes as the analysis filter

FILTERING VIEW

SHORT TIME SYNTHESIS

problem of obtaining a sequence back from its discrete-time STFT.

This equation represents a synthesis equation for the discrete-time STFT.

FILTER BANK SUMMATION (FBS) METHOD

the discrete STFT is considered to be the set of outputs of a bank of filters. the output of each filter is modulated with a complex exponential, and these modulated filter outputs are summed at each instant of time to obtain the corresponding time sample of the original sequence That is, given a discrete STFT, X (n, k), the FBS method synthesize a sequence y(n) satisfying the following equation:

OVERLAP-ADD METHOD
Just as the FBS method was motivated from the filteling view of the STFT, the OLA method is motivated from the Fourier transform view of the STFT. In this method, for each fixed time, we take the inverse DFT of the corresponding frequency function and divide the result by the analysis window. However, instead of dividing out the analysis window from each of the resulting short-time sections, we perform an overlap and add operation between the short-time sections.
84

given a discrete STFT X (n, k), the OLA method synthesizes a sequence Y[n] given by

Furthermore, if the discrete STFT had been decimated in time by a factor L, it can be similarly shown that if the analysis window satisfies

DESIGN OF DIGITAL FILTER BANKS

282 297: Rabiner & Schafer

USING IIR FILTER

USING FIR FILTER

100

FILTER BANK ANALYSIS AND SYNTHESIS

101

102

103

FBS synthesis results in multiple copies of the input:

104

PHASE VOCODER
The fourier series is computed over a sliding window of a single pitch period duration and provide a measure of amplitude and frequency trajectories of the musical tones.

105

106

107

which can be interpreted as a real sinewave that is amplitude- and phase-modulated by the STFT, the "carrier" of the latter being the kth filter's center frequency. the STFT of a continuos time signal as,

108

109

where is an initial condition. The signal is likewise referred to as the instantaneous amplitude for each channel. The resulting filter-bank output is a sinewave with generally a time-varying amplitude and frequency modulation. An alternative expression is,

110

which is the time-domain counterpart to the frequency-domain phase derivative.

111

we can sample the continuous-time STFT, with sampling interval T, to obtain the discrete-time STFT.

112

113

114

115

116

117

SPEECH MODIFICATION

118

119

120

121

122

CEPSTRAL) HOMOMORPHIC (CEPSTRAL) SPEECH ANALYSIS

use of the short-time cepstrum as a representation of speech and as a basis for estimating the parameters of the speech generation model. cepstrum of a discrete-time signal,

123

124

That is, the complex cepstrum operator transforms convolution into addition. This property, is what makes the cepstrum useful for speech analysis, since the model for speech production involves convolution of the excitation with the vocal tract impulse response, and our goal is often to separate the excitation signal from the vocal tract signal.
125

The key issue in the definition and computation of the complex cepstrum is the computation of the complex logarithm. ie, the computation of the phase angle arg[X(ej)], which must be done so as to preserve an additive combination of phases for two signals combined by convolution

126

SHORTTHE SHORT-TIME CEPSTRUM

The short-time cepstrum is a sequence of cepstra of windowed finite-duration segments of the speech waveform.

127

128

RECURSIVE COMPUTATION OF THE COMPLEX CEPSTRUM Another approach to compute the complex cepstrum applies only to minimum-phase signals. i.e., signals having an z-transform whose poles and zeros are inside the unit circle. An example would be the impulse response of an all-pole vocal tract model with system function
129

In this case, all the poles ck must be inside the unit circle for stability of the system.

130

SHORTSHORT-TIME HOMOMORPHIC FILTERING OF SPEECH PAGE N0: 63, RABINER & SCHAFER

131

The low quefrency part of the cepstrum is expected to be representative of the slow variations (with frequency) in the log spectrum, while the high quefrency components would correspond to the more rapid fluctuations of the log spectrum.

132

the spectrum for the voiced segment has a structure of periodic ripples due to the harmonic structure of the quasi-periodic segment of voiced speech. This periodic structure in the log spectrum manifests itself in the cepstrum peak at a quefrency of about 9ms. The existence of this peak in the quefrency range of expected pitch periods strongly signals voiced speech. Furthermore, the quefrency of the peak is an accurate estimate of the pitch period during the corresponding speech interval. the autocorrelation function also displays an indication of periodicity, but not nearly as unambiguously as does the cepstrum. But the rapid variations of the unvoiced spectra appear random with no periodic structure. As a result, there is no strong peak indicating periodicity as in the voiced case.
133

These slowly varying log spectra clearly retain the general spectral shape with peaks corresponding to the formant resonance structure for the segment of speech under analysis.

134

APPLICATION TO PITCH DETECTION

The cepstrum was first applied in speech processing to determine the excitation parameters for the discrete-time speech model. The successive spectra and cepstra are for 50 ms segments obtained by moving the window in steps of 12.5 ms (100 samples at a sampling rate of 8000 samples/sec).

135

for the positions 1 through 5, the window includes only unvoiced speech for positions 6 and 7 the signal within the window is partly voiced and partly unvoiced. For positions 8 through 15 the window only includes voiced speech. the rapid variations of the unvoiced spectra appear random with no periodic structure. the spectra for voiced segments have a structure of periodic ripples due to the harmonic structure of the quasi-periodic segment of voiced speech.
136

137

the cepstrum peak at a quefrency of about 11 12 ms strongly signals voiced speech, and the quefrency of the peak is an accurate estimate of the pitch period during the corresponding speech interval. Presence of a strong peak implies voiced speech, and the quefrency location of the peak gives the estimate of the pitch period.
138

MELMEL-FREQUENCY CEPSTRUM COEFFICIENTS MFCC) (MFCC)

The idea is to compute a frequency analysis based upon a filter bank with approximately critical band spacing of the filters and bandwidths. For 4 KHz bandwidth, approximately 20 filters are used. a short-time Fourier analysis is done first, resulting in a DFT Xn[k] for analysis time n. Then the DFT values are grouped together in critical bands and weighted by a triangular weighting function.
139

the bandwidths are constant for center frequencies below 1 kHz and then increase exponentially up to half the sampling rate of 4 kHz resulting in a total of 22 filters. The mel-frequency spectrum at analysis timen is defined for r = 1,2,...,R as

140

141

is a normalizing factor for the rth mel-filter. For each frame, a discrete cosine transform of the log of the magnitude of the filter outputs is computed to form the function mfccn[m], i.e.,

142

143

shows the result of mfcc analysis of a frame of voiced speech in comparison with the shorttime Fourier spectrum, LPC spectrum, and a homomorphically smoothed spectrum. all these spectra are different, but they have in common that they have peaks at the formant resonances. At higher frequencies, the reconstructed melspectrum has more smoothing due to the structure of the filter bank.

144

THE SPEECH SPECTROGRAM

simply a display of the magnitude of the STFT. Specifically, the images in Figure are plots of where the plot axes are labeled in terms of analog time and frequency through the relations tr = rRT and fk = k/(NT), where T is the sampling period of the discrete-time signal x[n] = xa(nT).
145

In order to make smooth, R is usually quite small compared to both the window length L and the number of samples in the frequency dimension, N, which may be much larger than the window length L. Such a function of two variables can be plotted on a two dimensional surface as either a grayscale or a color-mapped image. The bars on the right calibrate the color map (in dB).

146

147

if the analysis window is short, the spectrogram is called a wide-band spectrogram which is characterized by good time resolution and poor frequency resolution. when the window length is long, the spectrogram is a narrow-band spectrogram, which is characterized by good frequency resolution and poor time resolution.
148

THE SPECTROGRAM

A classic analysis tool.

Consists of DFTs of overlapping, and windowed frames.

Displays the distribution of energy in time and frequency.

10 log10 X m ( f ) is typically displayed.
2

149

THE SPECTROGRAM CONT.

150

151

Note the three broad peaks in the spectrum slice at time tr = 430 ms, and observe that similar slices would be obtained at other times around tr = 430 ms. These large peaks are representative of the underlying resonances of the vocal tract at the corresponding time in the production of the speech signal.
152

The lower spectrogram is not as sensitive to rapid time variations, but the resolution in the frequency dimension is much better. This window length is on the order of several pitch periods of the waveform during voiced intervals. As a result, the spectrogram no longer displays vertically oriented striations since several periods are included in the window.
153

SHORT TIME ACF

/m/ /ow/ /s/

ACF

154

CEPSTRUM
SPEECH WAVE (X)= EXCITATION (E) . FILTER (H)

(S)

(H)
(Vocal tract filter)

(E)
Glottal excitation From Vocal cords (Glottis)

http://home.hib.no/al/engelsk/seksjon/SOFF-MASTER/ill061.gif
155

CEPSTRAL ANALYSIS
Signal(s)=convolution(*) of
glottal excitation (e) and vocal_tract_filter (h) s(n)=e(n)*h(n), n is time index

After Fourier transform FT: FT{s(n)}=FT{e(n)*h(n)}

Convolution(*) becomes multiplication (.) n(time) w(frequency),

S(w) = E(w).H(w) Find Magnitude of the spectrum |S(w)| = |E(w)|.|H(w)| log10 |S(w)|= log10{|E(w)|}+ log10{|H(w)|}
Ref: http://iitg.vlab.co.in/?sub=59&brch=164&sim=615&cnt=1
156

CEPSTRUM
C(n)=IDFT[log10 |S(w)|]= IDFT[ log10{|E(w)|} + log10{|H(w)|} ]
X(n) S(n) windowing DFT X(w) Log|x(w)| Log|x(w)| IDFT C(n)

N=time index w=frequency I-DFT=Inverse-discrete Fourier transform

In c(n), you can see E(n) and H(n) at two different positions Application: useful for (i) glottal excitation (ii) vocal tract filter analysis

157

EXAMPLE OF CEPSTRUM
sampling frequency 22.05KHz

158

SUB BAND CODING

159

the time-decimated subband outputs are quantized and encoded, then are decoded at the receiver. In subband coding, a small number of filters with wide and overlapping bandwidths are chosen and each output is quantized each bandpass filter output is quantized individually. although the bandpass filters are wide and overlapping, careful design of the filter, resuIts in a cancellation of quantization noise that leaks across bands.
160

Quadrature mirror filters are one such filter class; shows an example of a two-band subband coder using two overlapping quadrature mirror filters Quadrature mirror filters can be further subdivided from high to low filters by splitting the fullband into two, then the resulting lower band into two, and so on.
161

This octave-band splitting, together with the iterative decimation, can be shown to yield a perfect reconstruction filter bank such octave-band filter banks, and their conditions for perfect reconstruction, are closely related to wavelet analysis/synthesis structures.

162

163

164

LINEAR PREDICTION (INTRODUCTION):

The object of linear prediction is to estimate the output sequence from a linear combination of input samples, past output samples or both :
q p

y(n) = b( j) x(n j) a(i) y(n i)

The factors a(i) and b(j) are called predictor coefficients.
j =0 i =1

165

LINEAR PREDICTION (INTRODUCTION):

Many systems of interest to us are describable by a linear, constant-coefficient difference equation :
p q

a(i) y(n i) = b( j ) x(n j )

If Y(z)/X(z)=H(z), where H(z) is a ratio of polynomials N(z)/D(z), then
q p

i =0

j =0

N ( z ) = b( j ) z j and D( z ) = a(i ) z i
j =0 i =0 Thus the predictor coefficients give us immediate access to the poles and zeros of H(z).

166

LINEAR PREDICTION (TYPES OF SYSTEM MODEL):

There are two important variants :
All-pole model (in statistics, autoregressive (AR) model ) :
The numerator N(z) is a constant.

All-zero model (in statistics, moving-average (MA) model ) :

The denominator D(z) is equal to unity.

The mixed pole-zero model is called the autoregressive moving-average (ARMA) model.

167

LINEAR PREDICTION (DERIVATION OF LP EQUATIONS):

Given a zero-mean signal y(n), in the AR model : p
y (n) = a(i ) y (n i )
The error is :
i =1

e( n ) = y ( n ) y ( n )
p

= a (i ) y (n i )
i =0

To derive the predictor we use the orthogonality principle, the principle states that the desired coefficients are those which make the error orthogonal to the samples y(n-1), y(n-2),, y(n-p).

168

LINEAR PREDICTION (DERIVATION OF LP EQUATIONS):

Thus we require that < y (n j )e(n) >= 0 for j = 1, 2, ..., p
Or,
p

y (n j ) a (i ) y (n i ) = 0
i =0

Interchanging the operation of averaging and summing, and representing < > by summing over n, we have
p

a(i) y(n i) y(n j ) = 0, j = 1,..., p

The required predictors are found by solving these equations.
i =0 n

169

LINEAR PREDICTION (DERIVATION OF LP EQUATIONS):

The orthogonality principle also states that resulting minimum error is given by

E = e 2 ( n ) = y ( n ) e( n )
Or,
p

a(i) y(n i) y(n) = E

i =0 n

We can minimize the error over all time : p

a ( i )ri j = 0 , j = 1 ,2 , ...,p

i=0

a ( i ) ri = E

where

ri =

y ( n) y ( n i )
n =

170

LINEAR PREDICTION (APPLICATIONS):

Autocorrelation matching :
We have a signal y(n) with known autocorrelation . We model this with the AR system shown below : ryy (n) y (n ) e(n)

1-A(z)

H ( z) =

A( z )

1 ai z i
i =1

171

LINEAR PREDICTION (ORDER OF LINEAR PREDICTION):

The choice of predictor order depends on the analysis bandwidth. The rule of thumb is :
p= 2 BW +c 1000

For a normal vocal tract, there is an average of about one formant per kilo Hertz of BW. One formant requires two complex conjugate poles. Hence for every formant we require two predictor coefficients, or two coefficients per kilo Hertz of bandwidth.

172

LINEAR PREDICTION (AR MODELING OF SPEECH SIGNAL):

True Model:
Pitch DT Voiced Impulse generator G(z) Glottal Filter Gain
s(n) Speech Signal
U(n) Voiced Volume velocity

V U

H(z) Vocal tract Filter

R(z) LP Filter

Uncorrelated

Unvoiced

Noise generator Gain

173

LINEAR PREDICTION (AR MODELING OF SPEECH SIGNAL):

Using LP analysis :
Pitch DT Voiced Impulse generator V U White Noise Unvoiced generator
Gain estimate
s(n) Speech

All-Pole Filter (AR)

Signal

H(z)

A Naidu Book Chapter0 FrontPages 2017-03-12
No ratings yet
A Naidu Book Chapter0 FrontPages 2017-03-12
23 pages
Linear Dynamic Systems and Signals 1
0% (3)
Linear Dynamic Systems and Signals 1
4 pages
Optimal Control: Linear Quadratic Methods
From Everand
Optimal Control: Linear Quadratic Methods
Brian D. O. Anderson
4/5 (2)
Acoustic Theory of Speech Production
No ratings yet
Acoustic Theory of Speech Production
57 pages
Fundamentals of Acoustics
From Everand
Fundamentals of Acoustics
Michel Bruneau
3/5 (1)
Homework 1: Mechanical and Aerospace Engineering Spring 2015
No ratings yet
Homework 1: Mechanical and Aerospace Engineering Spring 2015
2 pages
FFT Window Functions - Limits On FFT Analysis
No ratings yet
FFT Window Functions - Limits On FFT Analysis
4 pages
Signals and Systems
No ratings yet
Signals and Systems
37 pages
Lab 10: Sound Manipulation in Matlab Syntax
No ratings yet
Lab 10: Sound Manipulation in Matlab Syntax
8 pages
Simplified Amplifier Analysis: Feedbacd
No ratings yet
Simplified Amplifier Analysis: Feedbacd
7 pages
Detection of Learner's Affective State Based On Mouse Movements
No ratings yet
Detection of Learner's Affective State Based On Mouse Movements
10 pages
MM326 SYSTEM DYNAMICS - hw1 - Sol PDF
100% (1)
MM326 SYSTEM DYNAMICS - hw1 - Sol PDF
9 pages
1. Discrete Signals and Systems
No ratings yet
1. Discrete Signals and Systems
26 pages
Numerical Solution Methods
100% (1)
Numerical Solution Methods
9 pages
Textbook List PDF
No ratings yet
Textbook List PDF
2 pages
Lecture Notes
No ratings yet
Lecture Notes
136 pages
Signals and Systems
0% (1)
Signals and Systems
3 pages
Lab 03: Signals and Sequences: 3.1) Generating Sinusoidal Waveforms
No ratings yet
Lab 03: Signals and Sequences: 3.1) Generating Sinusoidal Waveforms
8 pages
Induction Motor Control Through AC DC AC Converter
No ratings yet
Induction Motor Control Through AC DC AC Converter
7 pages
Wiener
No ratings yet
Wiener
10 pages
EC6405-Control Systems Engineering
0% (1)
EC6405-Control Systems Engineering
12 pages
Discrete Wavelet Transform ..
0% (2)
Discrete Wavelet Transform ..
9 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
20 pages
CS2403 Digital Signal Processing
No ratings yet
CS2403 Digital Signal Processing
54 pages
Phase Locked Loop Design Fundamentals
No ratings yet
Phase Locked Loop Design Fundamentals
22 pages
Chapter - 1 - Signals and Systems
No ratings yet
Chapter - 1 - Signals and Systems
70 pages
Linear System Theory: 2.1 Discrete-Time Signals
No ratings yet
Linear System Theory: 2.1 Discrete-Time Signals
31 pages
Respect The Unstable PDF
No ratings yet
Respect The Unstable PDF
14 pages
Pytorch For Beginners
No ratings yet
Pytorch For Beginners
13 pages
Signals and Systems 8
No ratings yet
Signals and Systems 8
26 pages
Wiener Filters-Chapter56-2020 PDF
No ratings yet
Wiener Filters-Chapter56-2020 PDF
48 pages
Lab 07: CT-Fourier Series
No ratings yet
Lab 07: CT-Fourier Series
4 pages
Fundamentals of Signals: 1.1 What Is A Signal?
No ratings yet
Fundamentals of Signals: 1.1 What Is A Signal?
22 pages
Property Periodic Signal Fourier Series Coeﬃcients: k=−∞ k jkω t k=−∞ k jk (2π/T) t
No ratings yet
Property Periodic Signal Fourier Series Coeﬃcients: k=−∞ k jkω t k=−∞ k jk (2π/T) t
11 pages
Lab 05: Sampling and Aliasing
No ratings yet
Lab 05: Sampling and Aliasing
5 pages
E The Master of All
No ratings yet
E The Master of All
12 pages
CIC Filters: by Sylas Ashton
No ratings yet
CIC Filters: by Sylas Ashton
14 pages
Wavelets
No ratings yet
Wavelets
220 pages
Matlab DFT
No ratings yet
Matlab DFT
233 pages
Signal Processing in Matlab
No ratings yet
Signal Processing in Matlab
39 pages
A General Approach To Derivative Calculation Using Wavelet
No ratings yet
A General Approach To Derivative Calculation Using Wavelet
9 pages
Signal: Continuous-Time Signal: Defined in A Continuous Range of Time
No ratings yet
Signal: Continuous-Time Signal: Defined in A Continuous Range of Time
8 pages
Lab 5
No ratings yet
Lab 5
14 pages
ECG Denoising Using Wiener Filter and Kalman Filter
No ratings yet
ECG Denoising Using Wiener Filter and Kalman Filter
9 pages
Chebyshev Polynomial Approximation To Solutions of Ordinary Diffe PDF
No ratings yet
Chebyshev Polynomial Approximation To Solutions of Ordinary Diffe PDF
34 pages
On The Sum of The K Largest Eigenvalues of A Symmetric Matrix
No ratings yet
On The Sum of The K Largest Eigenvalues of A Symmetric Matrix
5 pages
ELE 301: Signals and Systems: Prof. Paul Cuff
No ratings yet
ELE 301: Signals and Systems: Prof. Paul Cuff
19 pages
Lecture1-Signals & Systems
100% (1)
Lecture1-Signals & Systems
20 pages
Z-Transform Analysis of Sampled-Data Control Systems Without Reference To Impulse Functions PDF
No ratings yet
Z-Transform Analysis of Sampled-Data Control Systems Without Reference To Impulse Functions PDF
4 pages
Adaptive Control Theory: Introduction
No ratings yet
Adaptive Control Theory: Introduction
19 pages
Implementation of The Vold-Kalman Order Tracking Filters
No ratings yet
Implementation of The Vold-Kalman Order Tracking Filters
8 pages
Taylor's Theorem - Wikipedia
No ratings yet
Taylor's Theorem - Wikipedia
79 pages
Lab 6: Convolution Dee, Furc Lab 6: Convolution
No ratings yet
Lab 6: Convolution Dee, Furc Lab 6: Convolution
6 pages
Digital Signal Processing Matlab Programs
No ratings yet
Digital Signal Processing Matlab Programs
34 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Nonnegative Matrices and Applicable Topics in Linear Algebra
From Everand
Nonnegative Matrices and Applicable Topics in Linear Algebra
Alexander Graham
No ratings yet
A Survey of Minimal Surfaces
From Everand
A Survey of Minimal Surfaces
Robert Osserman
3.5/5 (1)
Pitch Synchronous Spectrogram
No ratings yet
Pitch Synchronous Spectrogram
32 pages
Physiology of Hearing
No ratings yet
Physiology of Hearing
45 pages
Assignment On Speech
No ratings yet
Assignment On Speech
9 pages
07 - Chapter 1 PDF
100% (1)
07 - Chapter 1 PDF
76 pages
Aman Kumar - 19020242004 Gramophone
No ratings yet
Aman Kumar - 19020242004 Gramophone
4 pages
Introduction To Histology: Fernando J. Peraldo, M.D., MPH
No ratings yet
Introduction To Histology: Fernando J. Peraldo, M.D., MPH
18 pages
A Novel Kind of Concrete Superplasticizer Based On Lignite
No ratings yet
A Novel Kind of Concrete Superplasticizer Based On Lignite
8 pages
MUSIC-CURRICULUM Editeddocx
No ratings yet
MUSIC-CURRICULUM Editeddocx
3 pages
Iot2050 Operating Instructions en en-US
No ratings yet
Iot2050 Operating Instructions en en-US
151 pages
Kec International Limited: Hazard Identification & Risk Assessment For T & D DIVSION
No ratings yet
Kec International Limited: Hazard Identification & Risk Assessment For T & D DIVSION
1 page
RRE Reed Switch Notes
No ratings yet
RRE Reed Switch Notes
19 pages
EFRAC - Gas Brochure
No ratings yet
EFRAC - Gas Brochure
18 pages
Jayshree Polymer
0% (1)
Jayshree Polymer
1 page
Cyanide Spectroquant 114561 WW Method 1999 PDF
No ratings yet
Cyanide Spectroquant 114561 WW Method 1999 PDF
20 pages
E3-Bending Test PDF
No ratings yet
E3-Bending Test PDF
12 pages
CW Channel
No ratings yet
CW Channel
40 pages
Genbio Exam
No ratings yet
Genbio Exam
13 pages
Juliana Smith PD
No ratings yet
Juliana Smith PD
11 pages
CEN+Online+Review Module+8 Final
100% (2)
CEN+Online+Review Module+8 Final
36 pages
Complications of MI
No ratings yet
Complications of MI
35 pages
Part # 1213474, 1213475: Pao-Based Compressor Lubricants
No ratings yet
Part # 1213474, 1213475: Pao-Based Compressor Lubricants
4 pages
Wood Pole or Pile Design Based On NDS 2018: Input Data & Design Summary
No ratings yet
Wood Pole or Pile Design Based On NDS 2018: Input Data & Design Summary
1 page
Gpon Shenzhen Anyk AK-G460
No ratings yet
Gpon Shenzhen Anyk AK-G460
5 pages
lord_of_the_flies_webquest
No ratings yet
lord_of_the_flies_webquest
7 pages
TEMS Discovery 21.2.1 Release Note
No ratings yet
TEMS Discovery 21.2.1 Release Note
32 pages
Dismantling & Reinstatement Qty. For Working Dprs
No ratings yet
Dismantling & Reinstatement Qty. For Working Dprs
28 pages
India Today 15 Aug 2022
No ratings yet
India Today 15 Aug 2022
138 pages
MET AWS Brochure B211184EN-A 210x280 Lores
No ratings yet
MET AWS Brochure B211184EN-A 210x280 Lores
8 pages
8 Worksheet (AS) Deformation: 2dd 2d 2l
No ratings yet
8 Worksheet (AS) Deformation: 2dd 2d 2l
3 pages
Environmental Studies (U1)
No ratings yet
Environmental Studies (U1)
63 pages
Seismo Rev4
No ratings yet
Seismo Rev4
2 pages
EXPORT DSR (1)
No ratings yet
EXPORT DSR (1)
37 pages
Report Webinar On Empty Nose Syndrome - Meeting 1
No ratings yet
Report Webinar On Empty Nose Syndrome - Meeting 1
12 pages