Speech and Audio Processing and Coding
Speech and Audio Processing and Coding
47
A/D conversion (cont.)
In practice, different bit-depth usually used for different audio
signals
Digital speech
The dynamic range of clean speech is around 40dB (between the
threshold of hearing and the loudest normal speech sound).
Background noise becomes obtrusive when SNR is worse than
about 30dB. Therefore, a 70dB dynamic range provides
reasonable quality, which is equivalent to 12-bit resolution (roughly
6dB/bit).
Commercial CD quality music
16 bits are usually used, i.e. 65536 levels, which correspond to
96dB dynamic range.
Digital mixing consoles, music effects units, and audio processing
software
It is common to use 24-bits or higher.
48
A/D conversion (cont.)
Reconstruction conditions
Aliasing effect and Nyquist criterion
To allow the perfect reconstruction of the original signal, the sampling rate
should be at least twice the highest frequency in the signal.
A lower sampling rate can cause aliasing effect.
49
Aliasing effect in spectral domain
50
Anti-aliasing
To avoid aliasing effect, the A/D converter usually incorporates an anti-
aliasing filter (a low-pass filter) before sampling, with a cut-off frequency near
the Nyquist frequency (half of the sampling rate).
In practice, it is difficult to design a steep cut-off low-pass filter. A non-ideal
filter is used instead, and the sampling rate is usually chosen to be more than
twice the highest frequency in the signal. For some typical applications, the
sampling rates are usually chosen as follows:
In telecommunication networks, 8kHz (the signal is band-limited to [300 3400]Hz)
Wideband speech coding, 16kHz (natural quality speech is band-limited to [50
7000]Hz)
Commercial CD music, 44.1kHz (audible frequency range reaches up to 20kHz)
Oversampling can be advantageous in some applications to relax the sharp
cut-off frequency requirements for anti-aliasing filters.
51
D/A converter
D/A conversion consists of two stages:
Deglitching
A process to convert the digital speech represented by bits into a continuous
voltage signal, similar to the sample and hold operation in A/D conversion.
Interpolating filter
A low-pass filter is then used to remove the sharp edges (causing high-frequency
noise) in the output voltage.
According to sampling theorem, the ideal low-pass filter with which the analog
signal can be perfectly recovered uses a sinc impulse function:
In practice, the sinc function is truncated to a limited interval, instead of infinite sum.
T t
T t
t g
/
) / sin(
) (
n
s
f
n
t g n x t x ) ( ] [ ) (
52
Compressed sensing
Many signals in practice are redundant
Information rate versus Nyquist rate
Signals can be perfectly reconstructed from a small number of (non-uniformly)
random samples (a smaller number than required by the Nyquist sampling
theorem).
Sparse representation
This concept is based on so-called sparse representation of signals, i.e. signals can
be decomposed as a linear combination of a small number of atoms (the signal
components) selected from a dictionary (i.e. the collection of all the atoms).