0% found this document useful (0 votes)
31 views

TS Lecture7

The document discusses digitization of speech through sampling, quantization, and binary coding. It explains that speech is sampled at a minimum rate of 8 kHz to avoid aliasing, and the samples are quantized into discrete levels and represented by binary code numbers for transmission as a digital pulse code modulated signal.

Uploaded by

Eayashen Arafat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

TS Lecture7

The document discusses digitization of speech through sampling, quantization, and binary coding. It explains that speech is sampled at a minimum rate of 8 kHz to avoid aliasing, and the samples are quantized into discrete levels and represented by binary code numbers for transmission as a digital pulse code modulated signal.

Uploaded by

Eayashen Arafat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Speech digitization

The human ear is capable of perceiving frequencies in the


range of 16Hz-20KHz, known as the audio range, whereas
speech produces a narrow band of frequencies 100Hz-10KHz in
this audio range.
A reduction in the bandwidth is desirable as it reduces the cost
of the communication systems.
An acceptable level of intelligibility of speech is obtained by

transmitting frequencies in the range of 300-3400Hz. Such a


band-limited (a bandwidth of 3.1 kHz) speech signal is often
called ‘toll’ (telephone) quality speech.
In this band-limited range of speech, the ear is most sensitive

to frequencies that lie around 3 KHz. In the case of female


voice, maximum energy is distributed around this frequency,
whereas in the case of male voice, the maximum energy occurs
at a much lower frequency. That’s why women are preferred as
telephone operators and announcers.
1
Speech digitization
 A channel in a communication system has a finite
transmission loss and is subject to noise impairment.
 When the length of the transmission path increases,
the signal-to-noise ratio at the receiving end
decreases.
 In analog voice transmission, the effect of noise and
interference is most apparent during speech pauses
when the signal amplitude is near zero. Even
relatively low noise levels can be quite annoying to a
listener during speech pauses. The same levels of
noise may be unnoticeable when speech is present.
 Hence, it is the absolute noise level of an idle channel
that determines the analog speech quality.
2
Speech digitization
 In a digital system, speech and speech pauses are encoded
with data pattern and transmitted at a constant power level.
 Signal regeneration at regular intervals bringing the signal to
the original level virtually eliminates all noise due to the
transmission medium. Thus, the idle channel noise is
determined by the encoding process and not by the
transmission link in a digital system.
 Besides, the ability of the digital transmission to reject
crosstalk is superior to that of an analog system. First, low
level crosstalks are eliminated because of the constant
amplitude signals. Second, high amplitude crosstalks result in
detection errors and as such are unintelligible.
 Other advantages of digital systems include the ability to
support nonvoice services, and easy data encryption and
performance monitoring.
 Although digital systems require greater bandwidth than
analog systems and transmission media like wire pairs cause
greater attenuation when larger bandwidth signals are passed
through them, the advantages offered by the digital systems
outweigh the bandwidth considerations. 3
Sampling
 The first step in digitizing speech is to establish a set of
discrete times at which the input waveform is sampled.
 The discrete sample instances may be spaced either at
regular or irregular intervals.
 The minimum sampling frequency required to reconstruct the
original waveform from the sampled sequence is given by
Nyquist criterion which can be stated as, fS≥2H
where, fS= sampling frequency or the Nyquist rate
H= highest frequency component in the
input analog waveform
 H is the bandwidth of the input waveform if it is not band
limited with a lower cut-off frequency. In this case, the
original waveform is reconstructed by passing the sampled
values through a low pass filter which smoothens out or
interpolates the signal between sampled values.
4
Sampling  Sampling is a process of
multiplying a constant
amplitude impulse train with
the input signal. It is an
amplitude modulation process,
where the pulse train acts as
the carrier.
 Since the amplitude of the
pulses is modulated, the
scheme is called pulse
amplitude modulation (PAM).
 The frequency spectrum of an
amplitude modulated signal,
when the carrier is a sine
wave, has frequencies ranging
from fC-H to fC+H, where fC is
the carrier frequency.
5
Sampling  If the carrier is a pulse train, as in
the case in PAM, the output
spectrum contains the
fundamental as well as the
harmonics of the fundamental.
 If the pulse train is a square wave
(50% duty cycle), only the
fundamental and odd harmonics
are present.
 The low pass filter at the receiver
end allows only the baseband
component 0-H Hz to pass. If fS is
less than twice H, portions of PAM
signal spectrum will overlap.
 This overlapping of the sidebands
produces beat frequencies that
interfere with the desired signal
and such an interference is
referred to as aliasing or foldover
distortion.
•The filter used for band limiting the input  To avoid aliasing effects, the
speech waveform is known as antialiasing minimum sampling frequency
Filter. required is 6.8 KHz though in
•8KHz sampling results in oversampling which digital telephone network, speech
Provides for the nonideal filter characteristics is sampled at 8 KHz rate.
such as lack of sharp cutoff. 6
Quantization & binary coding
 PAM systems are not generally useful over long
distances, owing to vulnerability of the individual
pulse amplitudes to noise, distortion & crosstalk.
 The amplitude susceptibility may be reduced or
eliminated by converting the PAM samples into a
digital format, thereby allowing the use of
regenerative repeaters to remove transmission
imperfections before errors result.
 With n bits, the no. of sample values that can be
represented is 2n. But the PAM sample amplitudes
can take on an infinite range of values. Therefore,
it is necessary to quantize the PAM sample
amplitude to the nearest of a range of discrete
amplitude levels.
7
Quantization & binary coding Signal V is confined to a range
from VL to VH, and this range is
divided into M equal steps. The
step size S=(VH-VL)/M
 In the center of each of these
steps we locate the quantization
levels V0, V1,…,VM-1. The quantized
signal Vq takes on any one of the
quantized level values.
 A signal V is quantized to its
nearest quantization level.
 The boundary values between the
steps are equidistant from two
quantization levels and a
convention may be adopted to
quantize them to one of the levels.
 Thus, the signal Vq makes a
quantum jump of step size S and
at any instant of time the
quantization error V-Vq has a
Vq=V3 if (V3-S/2)≤V<(V3+S/2)
magnitude which is equal to or
less than S/2.
Vq= V4if (V4-S/2)≤V<(V4+S/2)
 When the step size is uniform, it is
known as linear or uniform 8
quantization.
Quantization & Binary Coding
 The process of quantization itself brings about a
certain amount of noise immunity to the signal.
 The quantized signal is an approximation to the
original signal. The quality of approximation may be
improved by reducing the size of the steps and
thereby increasing the no. of allowable levels.
 However, reducing the step size makes the PAM
signal more susceptible to noise.
 So, each quantized level is represented by a code
number which is transmitted instead of the
quantized sample value itself. If binary arithmetic is
used for coding, then the code number is
transmitted as a series of pulses. Hence, such a
system of transmission is called pulse code
modulation (PCM). 9
Quantization & Binary Coding
 The analog signal is
limited to -4V to +4V.
 Step size is one volt.
 Eight quantization levels
are used and are
located at -3.5V, -2.5V,
…,+3.5V.
 Code number 0 is
assigned to -3.5V, the
code number 1 to -2.5V
and so on.
 Each code number has
its equivalent 3-bit
representation 10
A PCM system  The analog input signal V
is band limited to 3.4 KHz
to prevent aliasing and
sampled at 8 KHz.
 The quantizer and the
encoder together perform
the A-D conversion.
 The quantizer and the
decoder together perform
D-A conversion at the
receiver.
 The quantized PAM levels
are then passed through a
filter which rejects the
frequency components
lying outside the
baseband and produces a
reconstructed waveform
of the original band
limited signal.
11
Quantization noise  The instantaneous error e=V-
Vq is randomly distributed
within the range (S/2) and is
called the quantization error
or noise.
 The average quantization
noise output power is given
by the variance,

σ2=  (e   ) 2 p(e)de


=S2/12
 Signal to noise ratio (SQR) is
a good measure of
performance of a PCM system.
 SQR=1.76+6.02n dB
12
Companding  In linear or uniform quantization,
the magnitude of quantization
noise is absolute for a particular
system and is independent of the
input signal amplitude.
 Therefore, comparatively, the
weak and low-level signals suffer
worse from quantization noise
than the loud and strong signals.
 The very high percentage error
at low input signal levels actually
represents idle channel noise.
 The effect of this is particularly
bothersome during speech
pauses and can be minimized by
ef=(S/2)/|V|
For sinusoidal input, S=2Vm/M,
choosing 0 volt level as a
Hence, ef=[Vm/(M|V|)]×100% quantization level and avoiding
the mid points of the first
intervals on either side of the
zero level as quantization levels.
13
Companding
 The scheme which uses the
two first midpoints is known
as mid-riser scheme and the
other as mid-tread scheme.
 The mid-tread scheme uses
odd number of quantization
levels, i.e., M=2n-1
 In mid-tread scheme, very
low signals are decoded into a
constant, zero-level output.
 However, if a d.c. bias exists
in the encoder, idle channel
noise is still a problem with
mid-tread quantisation.
14
Companding  A more efficient method of minimizing
large variations in the percentage
quantization error over the signal range is
to use nonlinear or nonuniform
quantization.
 It is interesting to note that uniform
quantization intervals result in
nonuniform SQR over the signal range
and nonuniform intervals result in
uniform SQR.
 The effect of permitting larger
quantization intervals at higher signal
amplitudes is to compress the input signal
to achieve a uniform quantization level.
 The input signal is first compressed by
using a nonlinear functional device and
then a linear quantizer is used. At the
receiving end, the quantized signal is
expanded by a nonuniform device having
an inverse characteristic of the
compression at the sending end.
 The process of first compressing and then
expanding is referred to as companding.
15
Companding  A variety of nonlinear
compression-expansion
functions can be chosen to
implement a compandor. The
obvious one is a logarithmic
law.
 Unfortunately, the function
y=lnx does not pass through
the origin.
 So, it is necessary to substitute
a linear portion to the curve for
lower values of x.
 Most practical companding
systems are based on a law
For logarithmic section, suggested by K.W. Cattermole.
y=(1+lnAx)/(1+lnA) for 1/A≤x≤1  These equations are collectively
For linear section, known as A-law used by India
y=Ax/(1+lnA) for 0≤x≤1/A and other European countries.
A=compression coefficient
The expansion function is given by,  U.S.A & Japan follow a variation
x=ey(1+lnA)-1/A for 1/(1+lnA) ≤y≤1 of A-law known as µ-law.
x=y(1+lnA)/A for 0≤y≤1/(1+lnA) 16
Companding  In practice, a piecewise linear
segment approximation is used.
 A-law companding consists of eight
linear segments for each polarity.
 The slope halves for each segment
except the lowest two segments
which have the same slope.
 The lowest two segments of positive
& negative polarities coalesce into
one straight line segment.
 As a result, there are 13 effective
segments in the curve and the law is
sometimes referred to as 13-
segment companding law.
 In µ-law, the slope halves in the
lowest two segments also, giving rise
to 15 effective segments.
 Each segment is divided into 16
linear steps. Eight bits are required
to represent each sample value: 1-
bit sign, 3-bit segment number and a
4-bit linear step number.
 There are in all 256 defined signal
levels.
17
Differential coding
 PCM is not specifically designed for digitizing speech
waveforms.
 Speech waveforms exhibit considerable redundancy which
can be usefully exploited in designing coding schemes.
 The following characteristics of speech signals contribute to
the redundancy:
 Nonuniform amplitude distributions
 Sample-to-sample correlations
 Periodicity or cycle-to-cycle correlations
 Pitch interval-to-pitch interval correlations
 Speech pauses or inactivity factors
 A sizeable fraction of the human speech sounds is produced
by the flow of puffs of air from the lungs into the vocal tract.
The interval between these puffs of air is known as the pitch
interval. There may be as many as 20 to 40 pitch intervals in
a single sound.
 Typically, a party is active for about 40% of a call duration.
18
Differential coding
 Delta or differential coding systems are designed to take
advantage of the sample-to-sample redundancies in speech
waveforms.
 Because of the strong correlation between adjacent speech
samples, large abrupt changes in levels do not occur frequently
in speech waveforms.
 In such situations, it is more efficient to transmit or encode and
transmit only the signal changes instead of the absolute value
of the samples.
 Delta modulation (DM) is a scheme that transmits only the
signal changes and differential pulse code modulation (DPCM)
encodes the differences and transmits them.
 A delta modulator may be implemented by simply comparing
each new signal sample with the previous sample and
transmitting the resulting difference signal.
 At the receiver end, the difference signals are added up to
construct the absolute signal by using an integrator.
 However, such a system, being open loop, suffers from the
possibility of the receiver output diverging from the transmitter
input due to system errors or inaccuracies. 19
Differential coding  The system can be converted into
a closed loop system by setting up
a feedback path with an integrator
at the transmitting end.
 When the input is constant, the
output of the transmitter is an
alternating positive and negative
pulse train. This constitutes the
quantization noise in delta
modulators and is also known as
granular noise.
 If the transmitter input signal
changes too rapidly, the receiver
output is unable to keep up and
this phenomenon is known as
slope overload.
 This problem may be overcome by
using a variable slope integrator
whose output slope is increased or
decreased, depending on the rate
of change of the input signal.
20
Vocoders
 By considering some of the properties that are more or less
unique to speech, such as pitch interval and cycle
correlations, significant reductions can be achieved in bit
rates.
 Coding systems that are so specifically designed for voice
signal are known as voice coders or vocoders & operate
typically at bit rates in the range 1.2-2.4 kbps.
 Vocoders take into account the physiology of the vocal cords,
the larynx, the throat, the mouth, the nasal passages and the
ear in their design.
 The basic purpose of the vocoders is to encode only the
perceptually important aspects of speech and thereby reduce
the bit rate significantly.
 As a result, the reproduced voice is synthetic sounding and
unnatural with artificial quality.
 Main applications include recorded message announcements,
encrypted voice transmission, voice mail etc.
21
Vocoders  Human speech is generated in
two basic ways:
 Voiced sounds generated as

a result of vibrations in the


vocal cords.
 Unvoiced sounds formed by

expelling air through lips &


teeth ( in the pronunciation
Of s, p, t and f)
 Human speech can now be
modeled as a sequence of voiced
and unvoiced sounds passed
through a filter which represents
the effect of mouth, throat, etc.
on the generated sounds.
 Vocoders use frequency domain

techniques to analyze and process


redundancies in speech signal. 22
Vocoders
 There are three basic types of vocoders:
 Channel vocoders
 Formant vocoders
 Linear predictive coders
 The speech spectrum exhibits sound specific structures with
energy peaks at some frequencies and energy valleys at
others over short periods. Channel vocoders attempt to
determine these short term signal spectrums as a function
of time and take advantage of them.
 In addition, it also determine the nature of speech
excitations and the pitch intervals. The excitation
information is used at the receiver end to synthesize speech
by switching the appropriate signal source for the required
duration. The filter at the receiver implements a vocal tract
transfer function.
23
Vocoders
 The three or four energy peaks in the short-term
spectral density of speech are known as formants. A
formant vocoder determines the location and amplitude
of these spectral peaks and transmits this information
instead of the entire spectrum envelope.
 In addition to determining the nature of excitation and
the pitch interval or period, the linear predictive coders
(LPC) also predict parameters for vocal tract. The vocal
tract filter is an adjustable one whose parameters are
varied based on the prediction. The LPC functions as a
feedback system similar to adaptive DM and ADPCM.
As a result, quality of speech synthesized by LPC is
superior to that of the speech synthesized by the
channel or formant vocoders.
24

You might also like