This HTML version of
You might prefer to read the PDF version, or you can buy a hard copy from Amazon.
Chapter 11 Modulation and sampling
In Section 2.3 we saw that when a signal is sampled at 10,000 Hz, a component at 5500 Hz is indistinguishable from a component at 4500 Hz. In this example, the folding frequency, 5000 Hz, is half of the sampling rate. But I didn’t explain why.
This chapter explores the effect of sampling and presents the Sampling Theorem, which explains aliasing and the folding frequency.
I’ll start by exploring the effect of convolution with impulses; then I’ll use that effect to explain amplitude modulation (AM), which turns out to be useful for understanding the Sampling Theorem.
The code for this chapter is in chap11.ipynb, which is in the repository for this book (see Section 0.2). You can also view it at http://tinyurl.com/thinkdsp11.
11.1 Convolution with impulses
As we saw in Section 10.4, convolution of a signal with a series of impulses has the effect of making shifted, scaled copies of the signal.
As an example, I’ll read signal that sounds like a beep:
filename = '253887__themusicalnomad__positive-beeps.wav'
wave = thinkdsp.read_wave(filename)
wave.normalize()
And I’ll construct a wave with four impulses:
imp_sig = thinkdsp.Impulses([0.005, 0.3, 0.6, 0.9],
amps=[1, 0.5, 0.25, 0.1])
impulses = imp_sig.make_wave(start=0, duration=1.0,
framerate=wave.framerate)
And then convolve them:
convolved = wave.convolve(impulses)
Figure 11.1 shows the results, with the signal in the top left, the impulses in the lower left, and the result on the right.
You can hear the result in chap11.ipynb; it sounds like a series of four beeps with decreasing loudness.
The point of this example is just to demonstrate that convolution with impulses makes shifted, scaled copies. This result will be useful in the next section.
11.2 Amplitude modulation
Amplitude modulation (AM) is used to broadcast AM radio, among other applications. At the transmitter, the signal (which might contain speech, music, etc.) is “modulated” by multiplying it with a cosine signal that acts as a “carrier wave”. The result is a high-frequency wave that is suitable for broadcast by radio. Typical frequencies for AM radio in the United States are 500–1600 kHz (see https://en.wikipedia.org/wiki/AM_broadcasting).
At the receiving end, the broadcast signal is “demodulated” to recover the original signal. Surprisingly, demodulation works by multiplying the broadcast signal, again, by the same carrier wave.
To see how that works, I’ll modulate a signal with a carrier wave at 10 kHz. Here’s the signal:
filename = '105977__wcfl10__favorite-station.wav'
wave = thinkdsp.read_wave(filename)
wave.unbias()
wave.normalize()
And here’s the carrier:
carrier_sig = thinkdsp.CosSignal(freq=10000)
carrier_wave = carrier_sig.make_wave(duration=wave.duration,
framerate=wave.framerate)
We can multiply them using the * operator, which multiplies the wave arrays elementwise:
modulated = wave * carrier_wave
The result sounds pretty bad. You can hear it in chap11.ipynb.
Figure 11.2 shows what’s happening in the frequency domain. The top row is the spectrum of the original signal. The next row is the spectrum of the modulated signal, after multiplying by the carrier. It contains two copies of the original spectrum, shifted by plus and minus 10 kHz.
To understand why, recall that convolution in the time domain corresponds to multiplication in the frequency domain. Conversely, multiplication in the time domain corresponds to convolution in the frequency domain. When we multiply the signal by the carrier, we are convolving its spectrum with the DFT of the carrier.
Since the carrier is a simple cosine wave, its DFT is two impulses, at plus and minus 10 kHz. Convolution with these impulses makes shifted, scaled copies of the spectrum. Notice that the amplitude of the spectrum is smaller after modulation. The energy from the original signal is split between the copies.
We demodulate the signal, by multiplying by the carrier wave again:
demodulated = modulated * carrier_wave
The third row of Figure 11.2 shows the result. Again, multiplication in the time domain corresponds to convolution in the frequency domain, which makes shifted, scaled copies of the spectrum.
Since the modulated spectrum contains two peaks, each peak gets split in half and shifted by plus and minus 20 kHz. Two of the copies meet at 0 kHz and get added together; the other two copies end up centered at plus and minus 20 kHz.
If you listen to the demodulated signal, it sounds pretty good. The extra copies of the spectrum add high frequency components that were not it the original signal, but they are so high that your speakers probably can’t play them, and most people can’t hear them. But if you have good speakers and good ears, you might.
In that case, you can get rid of the extra components by applying a low-pass filter:
demodulated_spectrum = demodulated.make_spectrum(full=True)
demodulated_spectrum.low_pass(10000)
filtered = demodulated_spectrum.make_wave()
The result is quite close to the original wave, although about half of the power is lost after demodulating and filtering. But that’s not a problem in practice, because much more of the power is lost in transmitting and receiving the broadcast signal. We have to amplify the result anyway, another factor of 2 is not an issue.
11.3 Sampling
I explained amplitude modulation in part because it is interesting, but mostly because it will help us understand sampling. “Sampling” is the process of measuring an analog signal at a series of points in time, usually with equal spacing.
For example, the WAV files we have used as examples were recorded by sampling the output of a microphone using an analog-to-digital converter (ADC). The sampling rate for most of them is 44.1 kHz, which is the standard rate for “CD quality” sound, or 48 kHz, which is the standard for DVD sound.
At 48 kHz, the folding frequency is 24 kHz, which is higher than most people can hear (see https://en.wikipedia.org/wiki/Hearing_range).
In most of these waves, each sample has 16 bits, so there are 216 distinct levels. This “bit depth” turns out to be enough that adding more bits does not improve the sound quality noticeably (see https://en.wikipedia.org/wiki/Digital_audio).
Of course, applications other than audio signals might require higher sampling rates, in order to capture higher frequencies, or higher bit-depth, in order to reproduce waveforms with more fidelity.
To demonstrate the effect of the sampling process, I am going to start with a wave sampled at 44.1 kHz and select samples from it at about 11 kHz. This is not exactly the same as sampling from an analog signal, but the effect is the same.
First I’ll load a recording of a drum solo:
filename = '263868__kevcio__amen-break-a-160-bpm.wav'
wave = thinkdsp.read_wave(filename)
wave.normalize()
Figure 11.3 (top) shows the spectrum of this wave. Now here’s the function that samples from the wave:
def sample(wave, factor=4):
ys = np.zeros(len(wave))
ys[::factor] = wave.ys[::factor]
return thinkdsp.Wave(ys, framerate=wave.framerate)
I’ll use it to select every fourth element:
sampled = sample(wave, 4)
The result has the same framerate as the original, but most of the elements are zero. If you play the sampled wave, it doesn’t sound very good. The sampling process introduces high-frequency components that were not in the original.
Figure 11.3 (bottom) shows the spectrum of the sampled wave. It contains four copies of the original spectrum (it looks like five copies because one is split between the highest and lowest frequencies).
To understand where these copies come from, we can think of the sampling process as multiplication with a series of impulses. Instead of using sample to select every fourth element, we could use this function to make a series of impulses, sometimes called an impulse train:
def make_impulses(wave, factor):
ys = np.zeros(len(wave))
ys[::factor] = 1
ts = np.arange(len(wave)) / wave.framerate
return thinkdsp.Wave(ys, ts, wave.framerate)
And then multiply the original wave by the impulse train:
impulses = make_impulses(wave, 4)
sampled = wave * impulses
The result is the same; it still doesn’t sound very good, but now we understand why. Multiplication in the time domain corresponds to convolution in the frequency domain. When we multiply by an impulse train, we are convolving with the DFT of an impulse train. As it turns out, the DFT of an impulse train is also an impulse train.
Figure 11.4 shows two examples. The top row is the impulse train in the example, with frequency 11,025 Hz. The DFT is a train of 4 impulses, which is why we get 4 copies of the spectrum. The bottom row shows an impulse train with a lower frequency, about 5512 Hz. Its DFT is a train of 8 impulses. In general, more impulses in the time domain correspond to fewer impulses in the frequency domain.
In summary:
- We can think of sampling as multiplication by an impulse train.
- Multiplying by an impulse train corresponds to convolution with an impulse train in the frequency domain.
- Convolution with an impulse train makes multiple copies of the signal’s spectrum.
11.4 Aliasing
Section 11.2, after demodulating an AM signal we got rid of the extra copies of the spectrum by applying a low-pass filter. We can do the same thing after sampling, but it turns out not to be a perfect solution.
Figure 11.5 shows why not. The top row is the spectrum of the drum solo. It contains high frequency components that extend past 10 kHz. When we sample this wave, we convolve the spectrum with the impulse train (second row), which makes copies of the spectrum (third row). The bottom row shows the result after applying a low-pass filter with a cutoff at the folding frequency, 5512 Hz.
If we convert the result back to a wave, it is similar to the original wave, but there are two problems:
- Because of the low-pass filter, the components above 5500 Hz have been lost, so the result sounds muted.
- Even the components below 5500 Hz are not quite right, because the include contributions from leftover from the spectral copies we tried to filter out.
If the spectral copies overlap after sampling, we lose information about the spectrum, and we won’t be able to recover it.
But if the copies don’t overlap, things work out pretty well. As a second example, I loaded a recording of a bass guitar solo.
Figure 11.6 shows its spectrum (top row), which contains no visible energy above 5512 Hz. The second row shows the spectrum of the sampled wave, and the third row shows the spectrum after the low pass filter. The amplitude is lower because we’ve filtered out some of the energy, but the shape of the spectrum is almost exactly what we started with. And if we convert back to a wave, it sounds the same.
11.5 Interpolation
The low pass filter I used in the last step is a so-called brick wall filter; frequencies above the cutoff are removed completely, as if they hit a brick wall.
Figure 11.7 (right) shows what this filter looks like. Of course, multiplication by this filter, in the frequency domain, corresponds to convolution with a window in the time domain. We can find out what that window is by computing the inverse DFT of the filter, which is shown in Figure 11.7 (left).
That function has a name; it is the normalized sinc function, or at least a discrete approximation of it (see https://en.wikipedia.org/wiki/Sinc_function):
sinc(x) = |
|
When we apply the low-pass filter, we are convolving with a sinc function. We can think of this convolution as the sum of shifted, scaled copies of the sinc function.
The value of sinc is 1 at 0 and 0 at every other integer value of x. When we shift the sinc function, we move the zero point. When we scale it, we change the height at the zero point. So when we add up the shifted, scaled copies, they interpolate between the sampled points.
Figure 11.8 shows how that works using a short segment of the bass guitar solo. The line across the top is the original wave. The vertical gray lines show the sampled values. The thin curves are the shifted, scaled copies of the sinc function. The sum of these sinc functions is identical to the original wave.
Read that last sentence again, because it is more surprising than it might seem. Because we started with a signal that contained no energy above 5512 Hz, and we sampled at 11,025 Hz, we were able to recover the original spectrum exactly.
In this example, I started with a wave that had already been sampled at 44,100 Hz, and I resampled it at 11,025 Hz. After resampling, the gap between the spectral copies is the sampling rate, 11.025 kHz. If the original signal contains components that exceed half of the sampling rate, 5512 Hz, the copies overlap and we lose information.
But if the signal is “bandwidth limited”; that is, it contains no energy above 5512 Hz, the spectral copies don’t overlap, we don’t lose information, and we can recover the original signal exactly.
This result is known as the Nyquist-Shannon sampling theorem (see https://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem).
This example does not prove the Sampling Theorem, but I hope it helps you understand what it says and why it works.
Notice that the argument I made does not depend on the original sampling rate, 44.1 kHz. The result would be the same if the original had been sampled at a higher frequency, or even if the original had been a continuous analog signal: if we sample at framerate f, we can recover the original signal exactly, as long as it contains no energy at frequencies above f/2.
11.6 Exercises
Solutions to these exercises are in chap11soln.ipynb.
It is a good idea to filter out these frequencies before sampling; a low-pass filter used for this purpose is called an anti-aliasing filter.
Returning to the drum solo example, apply a low-pass filter before sampling, then apply the low-pass filter again to remove the spectral copies introduced by sampling. The result should be identical to the filtered signal.