Audio Compression Standards: James Rodney P. Santiago
Audio Compression Standards: James Rodney P. Santiago
Dec. 1991 „Star Track VI“ with AC-3 audio, Now AC-3 used
for movies, ATSC and worldwide addionally in MPEG2
transport streams and on DVD
96000 768Kbps 1,536Kbp 2,304Kbp 3,072Kbp 6,144Kbp 1,536Kbp 2,304Kbp 3,072Kbp 6,144Kbp
768Kbps
s s s s s s s s
Masking
800khz masker
Auditory masking
Temporal
masking
Audio Compression Systems Used
in MPEG-2 Transport Streams
• MPEG-1 layers 1, 2 & 3
• Only layer-II used in broadcast systems
• MPEG-2 audio (5.1 channels) possible, but rarely used
• All are ‘backward compatible’
• Dolby digital (AC3) USA ATSC and also DVB (Germany, Australia)
• 5.1 channels (0.1 = low freq effects)
16 bit
A
Right D
up to 768 kbit/s
15 to 20 kHz BW
32/44.1/48 kHz = approx.
Sampling Freq. 1.5 Mbit/s
16 bit
A
Left D
up to 768 kbit/s
15 to 20 kHz BW
32/44.1/48 kHz
Sampling Freq.
Amplitude, Frequency & Time Masks
Auditory masking
• Two sounds of similar frequency which occur at the
same time .
• Sounds at lower frequencies must be even closer
together in order to be masked by higher frequencies
Temporal masking
• Loud sound that drowns out softer sounds immediately
before, or after it.
Psychoacoustic Model
by using
hammer inner
ear semicircular
canals
cochlea
outer
ear auditory
middle nerves
ear eustachian
eardrum tube
Mechanical Representation of the Human ear
hammer
eardrum inner ear
membrane
receptors for
low frequencies
outer
ear
receptors auditory
middle
for high frequencies nerves
ear
eustachian tube
Electrical Representation of the human ear
60
40
20
0 2 4 6 8 10 12 14 f [kHz]
Frequency Masking
60
40 Masking threshold
20
0 2 4 6 8 10 12 14 f [kHz]
Frequency Masking
L [dB]
60
40
20
0 2 4 6 8 10 12 14 f [kHz]
Temporal Masking
L[dB] Premasking
50
40
Masking Postmasking
30
tone
20
10
N bit resolution
A
LP
D
Compressed
audio
out
Spectrum Psycho-
analysis acoustic
Time: coarse model
Frequency: fine
Audio Subband Coding
Audio in
BP Q
BP Q
Frequency Compressed
subbands
audio
BP Q out
Bandpass Quantizer
filter
Psycho Example:
acoustic MPEG layer I, II
512 point FFT
FFT
@MPEG Layer model
I,
1024 points
@ Layer II;
every 24ms
Subband Filtering @ MPEG-2 Layer I,II
L [dB]
60
40
20
0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]
60
40
20
0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]
Quantization @ MPEG-2 Layer I,II
Signal level in subband below masking
threshold determined by a signal at 8 kHz:
subband completely suppressed
L [dB]
60
40
20
0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]
Spectrum calculated
by means of FFT; thresholds
calculated after FFT; Signal level in subband above masking
quantizer controlled by threshold determined by a signal at 4 kHz:
psychoacoustic model quantization noise adjusted to below threshold
MPEG2 Audio Data Structure
Subband filter & 12 12 12
quantizer 0 samples samples samples
Block
of samples
Audio Transform Coding
Audio in (M)DCT
Quantizer
Modified Discrete
Cosine Transform
Compressed
audio
out
Psycho- Example:
Dolby Digital
FFT acoustic
model AC-3
Audio Hybrid Subband&Transform Coding
Audio in Sub-
band (M)DCT Quantizer
filter Compressed
Audio
Out
Psycho- Example:
acoustic MPEG layer III
FFT
model
Multichannel Audio Coding
Multi-
channel
audio in Detection
and removal
Filter of
process interchannel Quantizer
e.g. redundancies/
left, Compressed
irrelevancies
right, audio
rear out
Example:
MPEG layer III,
Psycho-
AC3
acoustic
FFT
model
Multichannel Audio 5.1
Subwoofer
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
Functional Block Diagram of an MPEG-2 AAC encoder
Input Output
Bitstream Multiplex
Perceptual
Rate/Distortion Control
Model
MPEG AAC-LC
• The AAC-LC is the next-generation successor to the mp3 audio
codec, invented and developed by Fraunhofer IIS.
• AAC-LC delivers transparent quality in compressed audio at
only 64 kbit/s per channel – compressed audio that is virtually
indistinguishable from the original audio source.
• The AAC-LC satisfies the requirements for broadcast quality as
defined by the EBU. With flexible sampling rates ranging from
8 kHz up to 192 kHz, bit rates up to 256 kbit/s per channel,
and with support for up to 48 channels.
• It can be used in applications that demands high quality and
unlimited bandwidth.
• It has support for mono, stereo and all common multi-channel
configurations.
• ideal codec for any low-bit-rate, high-quality audio
application on mobile devices.
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
MPEG HE-AAC
• High Efficiency – Advanced Audio Code, also known
ask AACplus.
HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
MPEG HE-AACv2
Also known as AACplusv2
HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
MPEG AAC-ELD
• AAC-LD, the low delay version of AAC.
• It combines the full-bandwidth, superior quality of AAC with a
low coding delay necessary for two-way audio communication.
• It features an algorithmic delay of only 20 ms, while offering
CD-like audio quality at 64 kbit/s per channel.
• With the integration of SBR technology and the feature set of
the LD codec, Fraunhofer’s AAC-ELD provides full audio
bandwidth at data rates down to 24 kbit/s per channel.
• Both the AAC-LD and AAC-ELD codecs are perfectly suited for
applications that require bi-directional communication, such as
Internet telephony and video conferencing.
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
HD-AAC
• The MPEG standard HD-AAC offers music encoding with quality
beyond CDs while being compatible with iPods and mobile phones.
• Audio CD’s store uncompressed music in 16-bit, 44.1 kHz quality,
while most music is now produced in the improved 24-bit, 96 kHz
format.
• HD-AAC provides this high-quality sound experience to the user, the
online music distribution and the consumer electronics industry.
• Based on the MPEG standards, Scalable lossless (SLS) and AAC,
HDAAC provides scalable-to-lossless compression of 24-bit quality
music content, thereby ensuring a seamless migration to future
AAC-compliant standards.
HD-AAC
• The MPEG standard HD-AAC offers music encoding with quality
beyond CDs while being compatible with iPods and mobile phones.
• Audio CD’s store uncompressed music in 16-bit, 44.1 kHz quality,
while most music is now produced in the improved 24-bit, 96 kHz
format.
• HD-AAC provides this high-quality sound experience to the user, the
online music distribution and the consumer electronics industry.
• Based on the MPEG standards, Scalable lossless (SLS) and AAC,
HDAAC provides scalable-to-lossless compression of 24-bit quality
music content, thereby ensuring a seamless migration to future
AAC-compliant standards.
SUMMARY
CODEC FEATURES TYPICAL APPLICATIONS TYPICAL BIT RATE
apple iPod
AAC-LC (Low Complexity High performance audio codec for
iTunes 128 Kbit/s (stereo)
Advanced Audio Codec excellent audio quality at low bit rates
ISDB-T Television broadcast (Japan)