MPEG Audio: Multimedia Communications: Coding, Systems, and Networking
MPEG Audio: Multimedia Communications: Coding, Systems, and Networking
MPEG Audio
Outline
Basics
Psychoacoustics Subband coding
MPEG-1 audio
Layer I and II Layer III Frame structure and packetization
MPEG-2 audio
Multichannel audio Backward compatible coding Non backward compatible coding
18-796/Spring 1999/Chen
Digital Audio
Telephone Speech Wideband Speech Mediumband Audio Wideband Audio
18-796/Spring 1999/Chen
Psychoacoustics
Threshold in quiet
Frequency Masking
SMR (Signal-to-Mask Ratio)
18-796/Spring 1999/Chen
Temporal Masking
Post-Masking: 50~200ms
18-796/Spring 1999/Chen
Subband Coding
H1 (z) H2 (z) M M
Q Q Q
M M
HM(z)
Analysis Filterbank
Maximal downsampling Q should be based on signal-to-masking ratio (SMR) Ear critical bands are not uniform, but logarithmic s
The filter bank should match the critical bands Tree-structure filter bank (to be derived on board)
18-796/Spring 1999/Chen
M Polyphase Representation
MPEG-1 Audio
ISO/IEC 11172-3 (1988~1991)
First high quality audio compression standard Sampling rates: 32, 44.1, 48 kHz CD quality two-channel audio at ~256 kbits/s
CD: 44.1 kHz 16 bits 2 = 1.411 Mbits/s
psychoacoustic model
11172-3 Encoder
ancillary data
18-796/Spring 1999/Chen
fra m e unpacking
reconstruction
11172-3 Decoder
ancillary data
18-796/Spring 1999/Chen
Layers
Increasing complexity, delay, and quality
Layer I: ~384 kbits/s for perceptually lossless quality (4:1) Layer II: ~192 kbits/s for perceptually lossless quality (8:1) Layer III: ~128 kbits/s for perceptually lossless quality (12:1) (for two channels)
18-796/Spring 1999/Chen
32 Analysis Filterbank
512-tap Masking Threshold Generator Dynamic Bit Allocator Coder
FFT
512-pt for Layer I 1024-pt for Layer II/III
18-796/Spring 1999/Chen
Block-Based Coding
12 Analysis Filterbank 12 12
...
Block: Layer I Superblock: Layer II/III
12 samples for Layer I, 36 samples for Layer II/III Block companding: Each block normalized by scalefactor For Layer II, up to 3 scalefactors, with 2-bit scalefactor select Each block/superblock receives one bit allocation
Analysis Filterbank
MDCT
Huffman Coding
Mux
Masking Threshold Generator Coding
FFT
18-796/Spring 1999/Chen
18-796/Spring 1999/Chen
Frame Structure
Header Info Side Info Subband Sanples Aux Data
Header info: Sync bits, system info, CRC (cyclic redundancy code) Side info: bit allocation, scalefactor, (and scalefactor select for Layer II and III) Subband samples: 32 12 for Layer I, 32 36 for Layer II and III Packetization: 4-byte header, 184-byte payload
18-796/Spring 1999/Chen
18-796/Spring 1999/Chen
MPEG-2 Audio
ISO/IEC 13818-3
Allows lower sampling rates
16, 22.05, and 24 kHz: about half of MPEG-1
From wideband speech to mediumband audio Higher frequency resolution Layer I, II, and III
Multichannel coding
2~5 channels; surround sound, multilingual, for visual/hearing-impaired
10
Multichannel Audio
2/0-stereo
3/0
3/1
Surround
3/2
18-796/Spring 1999/Chen
Compatibility
Forward compatibility
A new decoder can decode an old bitstream Usually simple to achieve
Backward compatibility
An old decoder can decode a new bitstream, at least partially Usually limits the coding efficiency
18-796/Spring 1999/Chen
11
MPEG-1/2 Frame
MPEG-2 Header
MPEG-2 Data
L C R LS RS Matrix
L0 R0 T3 T4 T5
L0 = ( L + C + LS ) 1 1 or = 1; = = 0 = 1+ 2 ; = = 2 R0 = ( R + C + RS )
L C R LS RS
L0 R0 T3 Matrix T4 T5
Matrixing
Dematrixing
18-796/Spring 1999/Chen
12
18-796/Spring 1999/Chen
MPEG-2 AAC
Noiseless Decoding
Enhancements
Preprocessing High resolution filterbanks
1024-line MDCT / 128
Legend Data Control Inverse Quantizer
Scale Factors
Prediction
Backward adaptive prediction in subbands M/S stereo coding Noiseless coding (entropy coding): Huffman coding
Intensity/ Coupling
TNS
Gain Control
13
Encoder
Perceptual Model Gain Control Legend Filter Bank Data Control
TNS
Intensity/ Coupling Quantized Spectrum Prediction of Previous Frame M/S Iteration Loops Scale Factors
Bitstream Multiplex
Quantizer
Noiseless Coding
18-796/Spring 1999/Chen
Main profile
Best quality, highest complexity 1024 or 128 MDCT
Low-complexity profile
No temporal noise shaping, no prediction
14
Simcast
To achieve backward compatibility at the cost of higher bitrate
L0 R0 L C R LS RS MPEG-2 AAC Encoder Mux Demux MPEG-2 AAC Decoder MPEG-1 Encoder MPEG-1 Decoder L0 R0 L C R LS RS
18-796/Spring 1999/Chen
References
Peter Noll, MPEG digital audio coding, IEEE Signal Processing Magazine, Sept. 1997, pp. 59-81 D. Pan, A tutorial on MPEG/audio compression, IEEE Multimedia, v. 2, no. 2, 1995, pp. 60-74 http://www.mpeg.org/MPEG/audio.html http://www.cselt.it/mpeg/faq/faq-audio.htm http://www.tnt.uni-hannover.de/project/mpeg/audio/
18-796/Spring 1999/Chen
15