0% found this document useful (0 votes)
2 views

Audio Compression

Audio compression reduces the amount of data in recorded audio for efficient transmission and storage, utilizing techniques such as silence compression, DPCM, and psychoacoustic models. It employs methods like predictive encoding for speech and perceptual encoding for music, which take advantage of human hearing limitations to mask inaudible frequencies, leading to lossy compression. The process involves quantization, which introduces noise but is often imperceptible to listeners due to frequency masking.

Uploaded by

Snehasis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Audio Compression

Audio compression reduces the amount of data in recorded audio for efficient transmission and storage, utilizing techniques such as silence compression, DPCM, and psychoacoustic models. It employs methods like predictive encoding for speech and perceptual encoding for music, which take advantage of human hearing limitations to mask inaudible frequencies, leading to lossy compression. The process involves quantization, which introduces noise but is often imperceptible to listeners due to frequency masking.

Uploaded by

Snehasis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

AUDIO COMPRESSION

DEFINITION
AUDIO COMPRESSION
• In MPEG Audio compression, following techniques are used:
• Silence compression — detect the “silence” in the audio signal, and
apply run-length encoding to remove the silence periods to achieve
compression.
• Differential Pulse Coded Modulation (DPCM) is adopted where the
amplitude difference between the two successive samples can be
stored using reduced bits if the difference in amplitude between
successive samples is small.
• Adaptive Differential Pulse Code Modulation (ADPCM)- Encode the
difference between two or more consecutive signals; the difference
is then quantized --> hence the loss. The loss in lossy compressions
is due to the quantization process that converts continuous range of
values to discrete ones.
• It is necessary to predict where the waveform is headed-mApple
has proprietary scheme called ACE/MACE. A Lossy scheme that tries
to predict where wave will go in next sample. Gives about 2:1
compression.
• Adaptive Predictive Coding (APC) is used on Speech

Please note: Since we are quantizing the samples, there is loss in compression
techniques. Quantisation leads to lossy compression.
• APC or Adaptive Predictive Coding is used for
Speech compression.
– Input signal is divided to fixed segments or windows.
– For each segment, some sample characteristics are
computed, e.g. pitch, period, loudness.
- These characteristics are used to predict the
signal.
- Computerised talking (speech synthesisers use
such methods) but at low bandwidth:
(What is quantization?)
Quantization is defined as a lossy data compression technique by
which intervals of data are grouped or binned into a single value
(or quantum). Quantization, in mathematics and digital signal
processing, is the process of mapping input values from a large
set (often a continuous set) to output values in a (countable)
smaller set, often with a finite number of elements. Rounding
and truncation are typical examples of quantization processes. In
MPEG audio compression, some bits are allocated for
quantisation. Quantisation provides compression but introduces
noise and makes the compression lossy. However, quantization
noise may not be perceived by human ear if the quantisation
noise frequencies are masked by the masking frequencies.

Audio compression uses psycho-acoustic models


We use this limited hearing property of the ear to compress audio
• If the frequencies are close and the amplitude of one is less than
the other close frequency then the second frequency may not be
heard (masked)

THE FREQUENCIES THAT ARE MASKED ARE SUPPRESSED IN AUDIO COMPRESSION


AND ARE NOT TRANSMITTED. IF MORE FREQUENCIES CAN BE MASKED, HIGHER
COMPRESSION RATIO CAN BE ACHIEVED. However, that degrades audio quality.
• All the inaudible frequencies (frequency
masking of signal in frequency domain) and
inaudible tones ( i.e. Audio masking of signals
in time domain as in temporal masking) are
masked in MPEG Audio compression.
• Explain audio compression with a neat
diagram (Very important) Give any one
diagram
1. The input audio signal is sampled and quantised
using PCM (Pulse coded Modulation)
2. The PCM samples are divided into frequency sub-
bands by using Analysis filters or critical band filters
(Filter banks) which breaks the signal to equal width
sub-band. These filters have Direct Cosine
Transforms (DCT) that divide the signal to 32 sub-
bands (32 narrow width frequency bands). The
scaling factors for these sub-bands are computed.
• Bits for quantisation and scaling are allocated by
the psychoacoustic model which works on data in
parallel to the subdivision of input signal to
frequency bands.
• So the Audio PCM signal is converted to
frequency Domain, quantised and scaling factors
are computed.
• Among the 32 frequency bands which have been
quantised applying quantisation bits from
psychoacoustic model, all frequencies are not
audible by human ear. Moreover, these
frequencies are quantised. The quantisation
noises are also not perceptible by ears. SO most
quantisation noise is suppressed.
6. The psychoacoustic modelling is applied in parallel to filter
process, i.e. converting input to narrow frequency bands.
According to this model, there can be frequency masking
or temporal masking. All audio frequency bits are not
transmitted as many audio bits are masked. Only certain
bits are allocated to be transmitted. The psychoacoustic
modeller has FFT transform (Fast Fourier Transform). The
quantisation noise is minimised by minimising the
audibility through masking.
7. Then the quantised output along with bits for quantisation
and scaling are formatted. The formatted or encoded
bitstream is transmitted.
8. At the decoder end, encoded bitstream is decoded as
PCM audio.
• In speech encoding, predictive encoding is
done. In music encoding, perceptual encoding
is done. Here low frequency sounds which are
not heard by ears are suppressed. The stereo
mode is turned off and only mono mode is
kept for low frequency sound. Temporal
masking and frequency masking are used to
mask those frequencies which human ear
cannot hear.
• Q: What is audio compression? Why it is done?
(2) What techniques it uses for speech and music
compression? (2)
• Q: Explain MPEG Audio compression technique
with a neat diagram (5) (Already discussed)
• Q: What is psychoacoustic model? What features
of psychoacoustic model are used for
compression in MPEG Audio? Which frequencies
and tones are masked? Also give the features of
frequency masking and temporal masking.
(3+3+2+2) (Given in slides)
• Q: How many layers of compression are offered by
MPEG (There are 3 layers of compression in MPEG)?
What are the encoding complexities in each layer?/
Q: What are the compression/encoding complexity in
Layer 1, layer 2 and layer 2 of MPEG Audio
compression? (3)
• What are the audio features of MPEG? (3)
• Q: What is predictive encoding and perceptual
encoding? Which gives higher compression ratio?
(Perceptual)(2+1)
• Q: What is quantization? When quantization is applied
to compression, is it lossy or lossless and why?
• Q: What is quantization? When quantization is
applied to compression, is it lossy or lossless
and why?
• Quantization is defined as a lossy data compression
technique by which intervals of data are grouped or
binned into a single value (or quantum). Quantization,
in mathematics and digital signal processing, is the
process of mapping input values from a large set (often
a continuous set) to output values in a (countable)
smaller set, often with a finite number of elements.
Rounding and truncation are typical examples of
quantization processes. In MPEG audio compression,
some bits are allocated for quantisation. Quantisation
provides compression but introduces noise and makes
the compression lossy. However, quantization noise
may not be perceived by human ear if the quantisation
noise frequencies are masked by the masking
frequencies
• When quantization is applied, compression is
always lossy. This is because a continuous
range of values is dicretised or mapped to a
single value. This introduces noise. This noise
needs to be removed. This causes
compression to be lossy where quantisation is
applied. In audio compression, most of the
noise frequencies are masked and not
transmitted.
• Q: What is audio compression? Why it is done?
(2) What techniques it uses for speech and music
compression in MPEG? (2)
• Audio compression (data), a type of lossy or
lossless compression in which the amount of
data in a recorded waveform is reduced to
differing extents for transmission respectively
with or without some loss of quality, used in CD
and MP3 encoding, Internet radio, and the like.
• It is done to reduce the total quantity of data
contained in the audio file for transmission so
that it does not require excess bandwidth of the
network. Moreover, it is easy to store small sized
compressed files.
• MPEG uses Predictive encoding for Speech
compression and Perceptual encoding for
music compression
• Predictive encoding transmits the difference in
signal values between successive samples
instead of transmitting the absolute sample
values and CR is less.
• Perceptual encoding takes the psychoacoustic
phenomenon in account. Human ears can
generally hear sound in freq. range of 20 Hz-
20 KHz but MOST SENSITIVE TO FREQ OF1-5
KHz, more specifically 2-4 KHz.
• The long silent periods of audio are
suppressed by applying Run length encoding
and are not transmitted.
• Perceptual encoding uses frequency and
temporal masking. Two types of masking happens
in our ears, Frequency masking and temporal
masking. It is seen that within nearby
frequencies, a louder sound of lower frequency
can mask a softer sound in higher frequency but
the frequencies have to be is a smaller range. If
range of frequencies is high, then frequency
masking does not take place. Like a frequency of
1 Kz at 60 dB can mask (cover, make inaudible) a
sound of 1.1 Kz at 40 dB. This is called frequency
masking
• THE MASKED FREQUENCIES ARE COMPRESSED, THEY ARE
NOT TRANSMITTED. Since these masked frequencies do
not contribute to quality of sound (as our human ears
cannot perceive them), the loss in transmitted audio
quality cannot be understood by the audio hearer. This
phenomenon is made use of in perceptual encoding
• Masking of sound in frequency domain is called frequency
masking and in time domain is called temporal masking.
When a loud or strong signal of lower frequency is heard
for some time, then when that sound is removed, for some
time we are unable to hear softer sounds at nearby
frequencies. This is called temporal masking. So softer
frequencies near loud frequencies can be masked or
covered during transmission.
• The masked frequencies are not transmitted and
suppressed resulting in good Compression Ratio (CR)
• Q: What are the compression/encoding
complexity in Layer 1, layer 2 and layer 2 of
MPEG Audio compression? (3)
• MPEG Compression are done in 3 layers.
• What are the audio features of MPEG? (3)
• Q: What is predictive encoding and perceptual
encoding? Which gives higher compression ratio?
(Perceptual)(2+1)
• (Already explained)
• Perceptual encoding gives higher CR as much of
the frequencies which our ear cannot hear and
perceive can be masked. The acoustically
irrelevant frequencies are removed and not
transmitted with the signal. Predictive encoding
transmits the difference in signal values between
successive samples instead of transmitting the
absolute sample values and CR is less.
• Give advantages and disadvantages of Audio
compression

You might also like