0% found this document useful (0 votes)
7 views

Lecture 16

Uploaded by

ETHAN ETHAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture 16

Uploaded by

ETHAN ETHAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

ELEC1200: A System View of

Communications: from Signals to Packets


Lecture 16
• Time-Frequency Analysis
– Analyzing sounds as a sequence of frames
– Spectrogram

• Source Coding/Data Compression


– Lossless vs. Lossy compression
– MP3 encoding

ELEC1200
Time-Frequency Analysis
• For many complex signals (like speech, music and other sounds), short
segments are well described by a sinusoidal representation with a
few important frequency components.

• Time-frequency analysis refers to the analysis of how short-term


frequency content changes over time.

• The spectrogram of a signal is a picture of how its amplitude


spectrum changes over time.
– Vertical axis represents frequency
– Horizontal axis represents time
– Image color represents amplitude
• Red = large amplitude, Blue = small amplitude

ELEC1200 2
Spectrogram Example
amplitude spectrum amplitude spectrum amplitude spectrum
35 35 35

30 loud 30 quiet 30 loud


25
high frequency 25
lower frequency 25
many frequencies
20 20 20

15 15 15

10 10 10

5 5 5

0 0 0
0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000
frequency (Hz) frequency (Hz) frequency (Hz)

4000

3500
blue = quiet
red = loud
frequency (Hz) 3000

2500

2000

1500

1000

500
Red = high amplitude
Blue = low amplitude 0
5 10 15 20 25 30 35
frame number
ELEC1200 3
Computation of the Spectrogram
• Divide the signal into a set of frames, typically about 20-50 ms long.

frame

• Compute the amplitude spectrum of each frame.


• This gives you a two dimensional array of real numbers, indexed by frame number
and frequency index.

• Plot this as an image.


– It is generally more informative to plot the logarithm of the amplitude, as this
compresses large amplitudes allowing the smaller details to show up.
– To avoid problems at zero, apply a small positive floor to the values (i.e. replace each
amplitude by P if it is smaller than P, where P is small).

ELEC1200 4
Speech Spectrogram

spectrogram of “she” can you guess what word this is?


4000 4000

3500 3500

3000 3000

2500 2500

frequency (Hz)
frequency (Hz)

2000 2000

1500 1500

1000 1000

500 500

0 0
5 10 15 20 25 30 35 5 10 15 20 25 30 35
frame number frame number

ELEC1200 5
Train Whistle vs. Bird Chirps
Can you figure out which is which?

spectrogram spectrogram
4000 4000

3500 3500

3000 3000

2500 2500

frequency (Hz)
frequency (Hz)

2000 2000

1500 1500

1000 1000

500 500

0 0
10 20 30 40 50 10 20 30 40 50
frame number frame number

ELEC1200 6
Speech Data Red = high amplitude
Blue = low amplitude

• Characteristics
– Recent measurements more
informative in predicting the
future than those in the distant
past.

– At each point in time, different


sounds (phonemes) may be
pronounced.

– Different phonemes have


different spectral content.

Phonemes
Any distinct unit of sound that distinguish
one word from another (e.g., p, b)
ELEC1200 7
ELEC1200: A System View of
Communications: from Signals to Packets
Lecture 16
• Time-Frequency Analysis
– Analyzing sounds as a sequence of frames
– Spectrogram

• Source Coding/Data Compression


– Lossless vs. Lossy compression
– MP3 encoding

ELEC1200 8
Source Coding/Data Compression
We have a way to reliably send bits across a complex communications
network

Key Question: Can we re-code or compress the


message bit stream to send the information it
contains in as few bits as possible?

ELEC1200 9
Source Encoding & Decoding

Source Store/Retrieve Source


INPUT OUT
Encoding Transmit/Receive Decoding

• Encoding (Compression)
– INPUT information is converted to a bit stream with as few bits as possible.
– Example INPUTs: text, music, images, video

• Decoding (Decompression)
– bit stream is converted to OUT, which is similar/identical to INPUT.

ELEC1200 10
Lossless vs. Lossy Compression
Source Store/Retrieve Source
INPUT OUT
Encoding Transmit/Receive Decoding

• For lossless compression


– OUTPUT exactly same as INPUT
– Usually used for “naturally digital” bit streams, e.g. documents, messages,
datasets, …
– Examples: Huffman encoding, LZW, zip files, rar files

• For lossy compression


– OUTPUT “close” or “similar” to INPUT
– Appropriate for data streams (audio, video) intended for human consumption via
imperfect sensors (ears, eyes).
– Examples: MP3, MPEG, WMV
ELEC1200 11
Why Lossy Compression?
Data Rate = Sampling rate * Quantization bits * Channels

• Sampling rate = No. Samples per second (Hz)


• Quantization bits = Value each sample needs to be rounded to a finite number of
values (e.g., 256).
• Channels: e.g., Need to support one or two audio channels
1. Monophonic -- single audio channel
2. Stereo -- 2 channels

• Digital Audio Example: 44100 Hz with 16 bits quantization and 2 channels


– Will generate about 1.4 Mb of data per second (84 Mb/minute or 5 Gb/hour)

– A 3-minute song will require 1.411 Mb/sec * 180 sec = 254 Mb = 31.75 MB

ELEC1200 12
Poor ways to compress an audio file
• Reduce the total number of bits per sample
– e.g., 16 to 8 bit
– Gives you a factor of 2 in compression
– However, introduces noticeable distortion
• Reduce the sampling rate
– e.g., 44 kHz to 22 kHz
– Again only a gain of a factor of 2 in size
– However, leads to a noticeable loss of high frequency information

• These schemes result in highly perceptible changes in the signal, but


a relatively small reduction in bit rate.

ELEC1200 13
MP3
• MPEG = Moving Pictures Experts Group
– set up by ISO (international standards organization)
– every few years issues a standard
• MPEG1 (1992)
• MPEG2 (1994)..

• MP3 stands for MPEG audio layer III


– MP3 achieves a 10:1 compression ratio!
– This enables
• bit-streaming
• compact audio storage
– It uses concepts of psycho-acoustics & human perception

ELEC1200 14
Psycho-acoustics
• Psycho-acoustic ~ Human perception of sound principles
• Human hearing frequencies at [20 Hz,20 kHz]
– Most sensitive at 2 to 4 KHz.
• Normal voice range is about 500 Hz to 2 kHz
– Low frequencies are vowels such as A, E, I, O, …
– High frequencies are consonants (can be combined with a vowel to form a
syllable) such as B, C, D, F, G, K, L, ...
• More sensitive to loudness at mid frequencies than at other
frequencies (e.g., Two tones of equal power and different
frequencies will not be equally loud)
– Intermediate frequencies at [500 Hz, 5000 Hz]
– Sensitivity decreases at low and high frequencies

ELEC1200 15
Perceptual Coding
• What matters is how the consumer (e.g. human ears or eyes)
perceives the input.
– Frequency range, amplitude sensitivity, color response, …
– Masking effects
• Identify information that can be removed from bit stream without
perceived effect, e.g.,
– Sounds outside frequency range
– Masked sounds
• Encode remaining information efficiently
– Use frequency-based transformations
– Quantize coefficients of frequency (loss occurs here)
– Add lossless coding (e.g., the Huffman encoding)

ELEC1200 16
Masking
• Masking: If a dominant tone • Definitions
is present, then sounds at – Auditory threshold – minimum
frequencies near it will be signal level at which a pure tone
harder to hear. can be heard
– Masking threshold – minimum
• Coding consequences signal level if a dominant tone is
– Less precision is required to present
store information about nearby
frequencies
– Less precision = coarser
quantization

ELEC1200 17
Quantization
• Real signals typically assume continuous values, e.g., between 0 and
+1 or -1 and +1.
• However, for digital storage, we use binary numbers, which have a
limited number of values
– An n bit number has 2n different values.
• Thus, we divide the expected signal range, R into 2n different levels,
and quantize the original signal by recording the closest level.

quantized signal

R 3 bits → 8 levels

original signal
ELEC1200 18
Resolution
• Quantization or rounding error is the difference between the actual
signal and the quantized signal
• We reduce quantization error by using more levels
• Resolution is either measured as
– the number of bits (more bits mean finer resolution)
– the difference between levels (smaller is better)
• Higher resolution requires more storage

quantization
error

R 8 levels
3 bit resolution
R
resolution =
2n − 1

ELEC1200 19
Principles of Auditory Coding
• Time frequency decomposition
– Divide the signal into frames
– Obtain the spectrum of each piece
• Use psycho-acoustic model to determine what information to keep
– Don’t store information outside the hearing range
(40 Hz to 15 kHz)
– Stereo info not stored for low frequencies
– Masking
• A component (at a given frequency) masks components at neighboring frequencies
• Store the information in the most compact way possible
– Minimize the bitrate requirement
– Maximize the audible auditory content

ELEC1200 20
MP3 schematic
Input: 1.411 Mbit/s (16 bit per channel @44.1 kHz - stereo)
Output: Coded audio signal at ~128 kbit/s
Without Compression: 1.4 M of data per second or 84 M per minute or 5 G per hour
MP3: 128 K of data per second or 7.68 M per minute or 0.46 G per hour

Frequency analysis loss of information lossless compression


similar to Fourier series happens here

minor extra
Frequency analysis effects of masking
similar to Fourier series
information
determined here
encoded here
ELEC1200 21
Non-uniform quantization
• MP3 compression quantizes the amplitudes of different frequency components
differently, depending upon masking.
• Frequency components near a dominant masker are quantized with fewer bits.

Frequency analysis lossless compression


loss of information
similar to Fourier series happens here

minor extra
Frequency analysis effects of masking
similar to Fourier series
information
determined here
encoded here
ELEC1200 22
Summary
• Audio waveforms are typically analyzed as a sequence of frames
– Within each frame, the signal can be well approximated by a few frequency
components
– The spectrogram can be used to visualize changes in the frequency content over
time
• Source coding/data compression
– Recode message stream to remove redundant information, aka compression. The
goal is to match data rate to actual information content.
– Two types of compression: Lossless vs lossy
• MP3 audio lossy compression combines framing and frequency analysis
with a non-uniform quantization based on a perceptual model
– By throwing away “unimportant” (imperceptible) information, we can obtain large
compression ratios

ELEC1200 23

You might also like