Audio and Digital Signal Processing
Audio and Digital Signal Processing
in Python
Create a sine wave
In this project, we are going to create a sine wave, and save it as a wav file.
Frequency: The frequency is the number of times a sine wave repeats a second. I will use a
frequency of 1KHz.
Sampling rate: Most real world signals are analog, while computers are digital. So we need
a analog to digital converter to convert our analog signal to digital. Details of how the
converter work are beyond the scope of this book. The key thing is the sampling rate, which
is the number of times a second the converter takes a sample of the analog signal.
Now, the sampling rate doesn’t really matter for us, as we are doing everything digitally, but
it’s needed for our sine wave formula. I will use a value of 48000, which is the value used in
professional audio equipment.
Sine Wave formula: If you forgot the formula, don’t worry. I had to check Wikipedia as
well.
1 y(t) = A * sin(2 * pi * f * t)
y(t) is the y axis sample we want to calculate for x axis sample t.
f is the frequency.
t is our sample. Since we need to convert it to digital, we will divide it by the sampling rate.
Amplitude
I mentioned the amplitude A. In most books, they just choose a random value for A, usually
1. But that won’t work for us. The sine wave we generate will be in floating point, and while
that will be good enough for drawing a graph, it won’t work when we write to a file. The reason
being that we are dealing with integers. If you look at wave files, they are written as 16 bit
short integers. If we write a floating point number, it will not be represented right.
To get around this, we have to convert our floating point number to fixed point. One of the
ways to do so is to multiply it with a fixed constant. How do we calculate this constant? Well,
the maximum value of signed 16 bit number is 32767 (2^15 – 1). (Because the left most bit is
reserved for the sign, leaving 15 bits. We raise 2 to the power of 15 and then subtract one, as
computers count from 0).
So we want full scale audio, we’d multiply it with 32767. But I want an audio signal that is
half as loud as full scale, so I will use an amplitude of 16000.
To the code:
1 import numpy as np
2
3 import wave
4
5 import struct
6
7 import matplotlib.pyplot as plt
8
9 # frequency is the number of times a wave repeats a second
10
11 frequency = 1000
12
13 num_samples = 48000
14
15 # The sampling rate of the analog to digital convert
16
17 sampling_rate = 48000.0
18
19 amplitude = 16000
20
21 file = "test.wav"
I just setup the variables I have declared.
But if you remembered what I said, list comprehensions are the most powerful features of
Python. I could have written the above as a normal for loop, but I wanted to show you the
power of list comprehensions. The above code is quite simple if you understand it. Let’s break
it down, shall we? It will be easier if you have the source code open as well.
The first thing is that the equation is in [], which means the final answer will be converted to
a list.
1 [ for x in range(num_samples)]
The range() function generates a list of numbers from 0 to num_samples. So we are saying
loop over a variable x from 0 to 48000, the number of samples we have.
1 2 * pi * f * t.
So if we look at the code again:
1 nframes=num_samples
2
3 comptype="NONE"
4
5 compname="not compressed"
6
7 nchannels=1
8
9 sampwidth=2
Okay, now it’s time to write the sine wave to a file. We are going to use Python’s inbuilt wave
library. Here we set the paramerters. nframes is the number of frames or samples.
comptype and compname both signal the same thing: The data isn’t
compressed. nchannels is the number of channels, which is 1. sampwidth is the sample width
in bytes. As I mentioned earlier, wave files are usually 16 bits or 2 bytes per sample.
1 wav_file=wave.open(file, 'w')
2
3 wav_file.setparams((nchannels, sampwidth, int(sampling_rate), nframes, comptype, compname))
We open the file and set the parameters.
1 for s in sine_wave:
2 wav_file.writeframes(struct.pack('h', int(s*amplitude)))
This might require some explanation. We are writing the sine_wave sample by
sample. writeframes is the function that writes a sine wave. All that is simple. This might
confuse you:
1 struct.pack('h', int(s*amplitude))
So let’s break it down into parts.
1 int(s*amplitude)
s is the single sample of the sine_wave we are writing. I am multiplying it with the amplitude
here (to convert to fixed point). We could have done it earlier, but I’m doing it here, as this is
where it matters, when we are writing to a file.
Now,the data we have is just a list of numbers. If we write it to a file, it will not be readable by
an audio player.
Struct is a Python library that takes our data and packs it as binary data. The h in the code
means 16 bit number.
So we take the sin of 0.5, and convert it to a fixed point number by multiplying it by 16000.
Now if we were to write this to file, it would just write 7664 as a string, which would be wrong.
Let’s look at what struct does:
Why 0xf0 0x1d? Well, if you convert 7664 to hex, you will get 0xf01d.
Why two values? Because we are using 16 bit values and our number can’t fit in one. So struct
broke it into two numbers.
Since the numbers are now in hex, they can be read by other programs, including our audio
players.
1 for s in sine_wave:
2 wav_file.writeframes(struct.pack('h', int(s*amplitude)))
This will take our sine wave samples and write it to our file, test.wav, packed as 16 bit audio.
Play the file in any audio player you have- Windows Media player, VLC etc. You should hear
a very short tone.
Now, we need to check if the frequency of the tone is correct. I am going to use Audacity, a
open source audio player with a ton of features. One of them is that we can find the frequency
of audio files. Let’s open up Audacity.
So we have a sine wave. Note that the wave goes as high as 0.5, while 1.0 is the maximum
value. Remember we multiplied by 16000, which was half of 36767, which was full scale?
Now to find the frequency. Go to Edit-> Select All (or press Ctrl A), then Analyse-> Plot
Spectrum.
You can see that the peak is at around a 1000 Hz, which is how we created our wave file.
Which is why I wasn’t happy when I had to study it again for my Masters. But I was in luck.
This time, the teacher was a practising engineer. He ran his own company and taught part
time. Unlike the university teachers, he actually knew what the equations were for.
He started us with the Discrete Fourier Transform (DFT). I had heard of the DFT, and had
no idea what it did. I could derive the equation, though fat lot of good it did me.
But this teacher (I forgot his name, he was a Danish guy) showed us a noisy signal, and then
took the DFT of it. He then showed the results in a graphical window. We clearly saw the
original sine wave and the noise frequency, and I understood for the first time what a DFT
does.
Use the DFT to get frequencies
To get the frequency of a sine wave, you need to get its Discrete Fourier Transform(DFT).
Contrary to what every book written by Phd types may have told you, you don’t need to
understand how to derive the transform. You just need to know how to use it.
In its simplest terms, the DFT takes a signal and calculates which frequencies are present in
it. In more technical terms, the DFT converts a time domain signal to a frequency domain.
What does that mean? Let’s look at our sine wave.
The wave is changing with time. If this was an audio file, you could imagine the player moving
right as the file plays.
In the frequency domain, you see the frequency part of the signal. This image is taken from
later on in the chapter to show you what the frequency domain looks like:
The signal will change if you add or remove frequencies, but will not change in time. For
example, if you take a 1000 Hz audio tone and take its frequency, the frequency will remain
the same no matter how long you look at it. But if you look at it in the time domain, you will
see the signal moving.
The DFT was really slow to run on computers (back in the 70s), so the Fast Fourier Transform
(FFT) was invented. The FFT is what is normally used nowadays.
The way it works is, you take a signal and run the FFT on it, and you get the frequency of the
signal back.
If you have never used (or even heard of) a FFT, don’t worry. I’ll teach you how to start using
it, and you can read more online if you want. Most tutorials or books won’t teach you much
anyway. They’ll usually blat you with equations, without showing you what to do with them.
1 frame_rate = 48000.0
2
3 infile = "test.wav"
4
5 num_samples = 48000
6
7 wav_file = wave.open(infile, 'r')
8
9 data = wav_file.readframes(num_samples)
10
11 wav_file.close()
We are reading the wave file we generated in the last example. This code should be clear
enough. The wave readframes() function reads all the audio frames from a wave file.
1 data = np.array(data)
We then convert the data to a numpy array.
1 data_fft = np.fft.fft(data)
We take the fft of the data. This will create an array with all the frequencies present in the
signal.
Now, here’s the problem. The fft returns an array of complex numbers that doesn’t tell us
anything. If I print out the first 8 values of the fft, I get:
1 In [3]: data_fft[:8]
2
3 Out[3]:
4
5 array([ 13.00000000 +0.j , 8.44107682 -4.55121351j,
6
7 6.24696630-11.98027552j, 4.09513760 -2.63009999j,
8
9 -0.87934285 +9.52378503j, 2.62608334 +3.58733642j,
10
11 4.89671762 -3.36196984j, -1.26176048 +3.0234555j ])
If only there was a way to convert the complex numbers to real values we can use. Let’s try to
remember our high school formulas for converting complex numbers to real…
Side Detour
Now what if you have no 1Hz frequency in your signal? You will still get a value at data_fft[1],
but it will be minuscule. To give you an example, I will take the real fft of a 1000 Hz wave:
1 data_fft = np.fft.fft(sine_wave)
2
3 abs(data_fft[0])
4
5 Out[7]: 8.1289678326462086e-13
6
7 abs(data_fft[1])
8
9 Out[8]: 9.9475299243014428e-12
10
11 abs(data_fft[1000])
12
13 Out[11]: 24000.0
If you look at the absolute values for data_fft[0] or data_fft[1], you will see they are tiny.
The e-12 at the end means they are raised to a power of -12, so something
like 0.00000000000812 for data_fft[0]. But if you look at data_fft[1000], the value is a
hue 24000. This can easily be plotted.
If we want to find the array element with the highest value, we can find it by:
1 plt.subplot(2,1,1)
2
3 plt.plot(data[:300])
4
5 plt.title("Original audio wave")
6
7 plt.subplot(2,1,2)
8
9 plt.plot(frequencies)
10
11 plt.title("Frequencies found")
12
13 plt.xlim(0,1200)
14
15 plt.show()
This should be known to you. The only new thing is the subplot function, which allows you to
draw multiple plots on the same window.
subplot(2,1,1) means that we are plotting a 2×1 grid. The 3rd number is the plot number, and
the only one that will change. It will become clearer when you see the graph.
And that’s it, folks. We took our audio file and calculated the frequency of it. Next, we will add
noise to our plot and then try to clean it.
1 plt.subplot(3,1,1)
2
3 plt.title("Original sine wave")
4
5 # Need to add empty space, else everything looks scrunched up!
6
7 plt.subplots_adjust(hspace=.5)
8
9 plt.plot(sine_wave[:500])
10
11 plt.subplot(3,1,2)
12
13 plt.title("Noisy wave")
14
15 plt.plot(sine_noise[:4000])
16
17 plt.subplot(3,1,3)
18
19 plt.title("Original + Noise")
20
21 plt.plot(combined_signal[:3000])
22
23 plt.show()
Nothing shocking here.
1 data_fft = np.fft.fft(combined_signal)
2
3 freq = (np.abs(data_fft[:len(data_fft)]))
data_fft contains the fft of the combined noise+signal wave. freq contains the absolute of the
frequencies found in it.
1 plt.plot(freq)
2
3 plt.title("Before filtering: Will have main signal (1000Hz) + noise frequency (50Hz)")
4
5 plt.xlim(0,1200)
We take the fft of the signal, as before, and plot it. This time, we get two signals: Our sine
wave at 1000Hz and the noise at 50Hz.
Now, to filter the signal. I won’t cover filtering in any detail, as that can take a whole book.
Instead, I will create a simple filter just using the fft. The goal is to get you comfortable with
Numpy.
1 for f in freq:
2 # Filter between lower and upper limits
3 # Choosing 950, as closest to 1000. In real world, won't get exact numbers like these
4 if index > 950 and index < 1050:
5 # Has a real value. I'm choosing >1, as many values are like 0.000000001 etc
6 if f > 1:
7 filtered_freq.append(f)
8
9 else:
10 filtered_freq.append(0)
11 else:
12 filtered_freq.append(0)
13 index += 1
Since I know my frequency is 1000Hz, I will search around that. In the real world, we will
never get the exact frequency, as noise means some data will be lost. So I’m using a lower
limit of 950 and upper limit of 1050. I check if the frequency we are looping is within this
range.
1 if f > 1:
2
3 filtered_freq.append(f)
I mentioned this earlier as well: While all frequencies will be present, their absolute values
will be minuscule, usually less than 1. So if we find a value greater than 1, we save it to
our filtered_freq array.
1 else:
2
3 filtered_freq.append(0)
4
5 else:
6
7 filtered_freq.append(0)
8
9 index += 1
If our frequency is not within the range we are looking for, or if the value is too low, we append
a zero. This is to remove all frequencies we don’t want. And then we increment index.
Update
As reader Jean Nassar pointed out, the whole code above can be replaced by one line.
So:
1 for f in freq:
2 # Filter between lower and upper limits
3 # Choosing 950, as closest to 1000. In real world, won't get exact numbers like these
4 if index > 950 and index < 1050:
5 # Has a real value. I'm choosing >1, as many values are like 0.000000001 etc
6 if f > 1:
7 filtered_freq.append(f)
8
9 else:
10 filtered_freq.append(0)
11 else:
12 filtered_freq.append(0)
13 index += 1
can be replaced by:
1 filtered_freq = [f if (950 < index < 1050 and f > 1) else 0 for index, f in enumerate(freq)]
Now back to the book…
1 plt.plot(filtered_freq)
2
3 plt.title("After filtering: Main signal only (1000Hz)")
4
5 plt.xlim(0,1200)
6
7 plt.show()
8
9 plt.close()
And we plot what we have.
1 recovered_signal = np.fft.ifft(filtered_freq)
Now we take the ifft, which stands for Inverse FFT. This will take our signal and convert it
back to time domain. We can now compare it with our original noisy signal. We do that with
graphing:
1 plt.subplot(3,1,1)
2
3 plt.title("Original sine wave")
4
5 # Need to add empty space, else everything looks scrunched up!
6
7 plt.subplots_adjust(hspace=.5)
8
9 plt.plot(sine_wave[:500])
10
11 plt.subplot(3,1,2)
12
13 plt.title("Noisy wave")
14
15 plt.plot(combined_signal[:4000])
16
17 plt.subplot(3,1,3)
18
19 plt.title("Sine wave after clean up")
20
21 plt.plot((recovered_signal[:500]))
22
23 plt.show()
Note that we will receive a warning:
And there you go. Using our very simplistic filter, we have cleaned a sine wave. And this brings
us to the end of this chapter.
Search for:SEARCH