0% found this document useful (0 votes)
79 views5 pages

A Novel Method of

This document proposes a new algorithm for speech compression using frequency domain analysis that embeds phase information in the magnitude spectrum to transmit fewer samples while maintaining high quality reconstruction. The algorithm divides speech into packets, takes the DFT, embeds phase in magnitude, selects αN samples to transmit, and reconstructs at the receiver. This allows transmitting fewer samples than traditional methods by exploiting properties of speech signals like being low pass and phase insensitive, reducing bandwidth needs while maintaining high fidelity.

Uploaded by

Claron Veigas
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views5 pages

A Novel Method of

This document proposes a new algorithm for speech compression using frequency domain analysis that embeds phase information in the magnitude spectrum to transmit fewer samples while maintaining high quality reconstruction. The algorithm divides speech into packets, takes the DFT, embeds phase in magnitude, selects αN samples to transmit, and reconstructs at the receiver. This allows transmitting fewer samples than traditional methods by exploiting properties of speech signals like being low pass and phase insensitive, reducing bandwidth needs while maintaining high fidelity.

Uploaded by

Claron Veigas
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

SPEECH COMPRESSION WITH

HIGHER BANDWIDTH EFFICIENCY


INTRODUCTION:

Today, rapid speech transmission has become critical in many applications. With more
quality being demanded by the end-user, and an increase in bandwidth usage, the delivery of
audio and allied applications on demand cannot be left behind.

In this paper, we wish to present a new algorithm for speech compression using the
frequency domain approach.

The same method has also been used in the compression of static images also.

To transmit a speech signal digitally, we have a lot of schemes.

 Sampling the signal in time domain.(PCM,DPCM,ADPCM,DM)


 Dividing the signal into number of sub-bands and encoding them separately
(Adaptive sub-band coding)
 Encoding information about how the speech signal was produced by the human
vocal system (Vocoders, RELP, CELP, LPC)

We are trying to introduce another scheme that utilizes the properties of speech signals
and transmits at a lower bit rate and reconstructs the signal back with less distortion.

PROPERTIES OF SPEECH SIGNALS:

Following are some of the basic properties of speech signals:

 They are low pass in nature.

 Their power spectrum approaches zero for zero frequency and reaches a peak in the
neighborhood of few hundred Hertz.

 Hearing mechanism is highly sensitive to frequency.

 Human ear is insensitive to phase variations.

 Frequency band from 300 to 3100 Hz is considered adequate for telephonic


communication.
The above properties of the speech signal have enabled us to devise a new method of speech
compression

A typical speech signal will look like,

Its corresponding spectrum would be,

DESCRIPTION:
Transmitting the spectrum of the signal instead of transmitting the original signal is far
more efficient. This is because the energy of the speech signal above 4 kHz is negligible; we can
very well compute the spectrum of the signal and transmit only the samples that correspond to 4
KHz of the spectrum irrespective of the sampling frequency.

By this type of transmission we can save the bandwidth required for transmission
considerably. Also it is not necessary that we have to transmit all the samples corresponding to
the 4 kHz frequency as it is sufficient to transmit a fraction of the samples without any
degradation in the quality.
Since the spectrum is considered in the above method both the magnitude and phase
information must be transmitted to reproduce the signal without any error. But this requires twice
the actual bandwidth .This problem can be solved by exploiting the property of real and even
signals. The spectrum of the samples is real and evenliness is artificially introduced such that
their spectra are also real and even. Thus by simple mathematics the complete phase information
is embedded within the magnitude spectrum and it is needed only to send ‘αN’ samples instead
of ‘2N’samples of the spectra (Magnitude and phase).

By adopting all these procedures and embedding the phase information in the magnitude
spectrum, a MATLAB simulation has to been performed to determine the optimum value of ‘α’
and ’N’. The result of the simulation is to be verified.

ALGORITHM:

 Divide the speech samples into a set of packets each of size ‘N’.

 Compute the corresponding N-point DFT of each packet.

 By signal processing, embed the phase information into the magnitude spectrum.

 Select only ‘αN’ number of samples of each packet and transmit it.

 Follow a similar reverse process at the receiver to reconstruct the signal. (After doing
appropriate zero padding).

From the above algorithm it is seen that a proper choice of α and N is important.

The inverse Fourier transform of the actual signal is given by its spectral components as,

x [n] = 1/N* [Σ (X [k] * exp (-j*2*π*n*k/N))] (1) (N-point IDFT)

Since the phase information has to be embedded in the magnitude spectrum at the transmitter
the processed spectrum would be,
xt [n]=1/(2*N) [ΣXt (k) exp (-j*2*π*n*k/(2*N))] (2)

where XT [n] has both x [n] and its mirror image.

Since ‘αN’ samples of the spectrum are transmitted at the receiver the even spectrum is
formed by padding N-αN-1 zeros at the end and we have

The reconstructed signal is,

[
x [n] =(1/2*N) X (0)+2* ΣX (k)cos (2*π*n*k/(2*N))] (3)

What do we require?
 We expect the value of α to be very low because to achieve maximum reduction in
the number of samples to be transmitted.
 We expect ‘N’ to be very low as it is an important factor in determining the speed of
operation of the transmitter because at the transmitter the ‘N’ samples are fed to a
processor, which computes the FFT of the samples. The time required for this
operation would be O[logN].
Taking into account the above requirements and choosing a small but optimum value of
‘α’ and ‘N’ the algorithm still gives a faithful reproduction of the signal without any complexity
both at the transmitter and at the receiver.

How does it work?

Simply speaking, the phase information is embedded with the magnitude of the frequency
samples by transforming the frequency samples from complex to real one. This has an added
advantage because for any low pass signal the frequency spectrum obtained by this method is
found to roll off very rapidly compared to the ordinary spectrum.

Hence the total number of significant frequency samples obtained with this method is
very less compared with the actual frequency spectrum samples of the signal. This helps us to
effectively reduce the number of samples to be chosen thereby reducing the number of samples
to be transmitted.

Thus we have to choose a relatively small number of frequency samples using this phase
embedding method than the actual method to compute the signal spectrum, even though the
signal is low pass in nature.
Assuming a pulse to be a low pass signal a MATLAB simulation has to been performed
to explain the method of compression.

ADVANTAGES:

The above method is more advantageous because of reduction of transmission bandwidth.


 Since only ‘αN’ samples are transmitted, the minimum required Bandwidth (Nyquist
band width) is reduced by a factor 1/α .
 Also since ‘N’ is less, this reduces the computation time of the FFT and hence the
successive samples need not be queued in a buffer (memory) by making computing
time (O [log (N)]) less than ‘N times sampling period’. The computation of N-point
DFT can be implemented with high-speed processors with very less time delay.
 This method does not require any computations with the adjacent samples to make
any decision except to simply collect the samples and compute the Fourier transform.
Because of this it can be implemented in real time without any time delay between
adjacent packets.
 This method of speech compression is speaker independent. Hence it does not require
any speaker model or the psychoacoustic model of the ear to make any decision
thereby making the method very simple.

You might also like