Several years back I had the pleasure of being able to learn the mathematics behind the Fourier Transform. At the time, the subject was overwhelmingly complex to me and finding simplified articles on what was going on was rather difficult. In this post, I am going to explain how it works while building up an arsenal of little mental concepts that will make the most complicated parts of the Fourier Transform easier to understand.
Samples and Frequencies
A sample is a single numeric value in a series of numeric values.
const samples = [0, 1, 0, 1];
Here we have four samples. It appears that they are binary, but we cannot know for sure. We need to have a range for the samples in order to be certain.
const samples = [0, 1, 0, 1];
const sampleRange = [0.0, 2.0];
In this case, our samples look binary but are actually in the range of 0.0 - 2.0. It is important to know the range of the samples before we interpret them.
When we sample something we take measurements over time. In order to be unbiased in our measurement, we would need to take the sample at discrete (unchanging distance between) intervals and the range of our measuring device should be fixed. In other words, we pick a fixed time gap between our measurements and an unchanging sensor and start measuring. A microphone with a digital converter is a good example of a sample measuring device. But we could also measure other types of input data too, as long as the measurement is a fixed between a minimum and a maximum and is a real number:
// Assume beach is a library hooked up to a sensor measuring wave height hitting the shore...
const samples = [];
const sampleRange = beach.getMinMaxWaveHeight();
const measureWaveHeight = function () {
const sample = beach.measureHeightOfWavesRightAtThisInstant();
samples.push(sample);
}
setInterval(measureWaveHeight, 1.0);
In the above example, we take a measurement of the wave height once every second, and add a new sample to our samples array. Over time, we build up enough sample data that we can begin to make measurements to determine the magnitude of the frequencies of the waves in our sample data.
Frequencies and the Fourier Transform
If we have enough samples and they are measured discretely, we have the data we need to extract frequency data from the samples themselves. The value of a frequency is traditionally referred to as an amplitude.
The Fourier Transform is the algorithm that converts a series of samples into approximated frequency amplitudes. The resulting frequency amplitudes are our best (but not perfect) approximation of the magnitude of that frequency in the samples that are provided. The magnitude could also be referred to as the energy level of the wave. The unit of the energy will differ depending on the medium we are measuring. Obviously this will differ greatly depending on whether you are measuring ocean waves, sound waves, or hormonal cycles in lab rats.
The Fourier Transform needs a series of samples in order to make a calculation. The more samples you provide, the larger the range of frequencies it can measure. The samples you provide to the Fourier Transform is called the sample window. A larger sample window is preferred, but requires more memory and processing power.
When we provide a sample window, we are uncertain of where in the sample window the frequencies begin and end. This is a practical application of the Uncertainty Principle. The larger the window, the more certain we might be that a frequency exists, but the less certain we are of when it began existing or when it ended existing.
Why is the output of the Fourier transform only an approximation of the frequency magnitude in the data? The reason is that we cannot be certain that the microsecond our sample window begins is perfectly aligned with the sinusoid of the frequency ranges that we are hoping to measure. What if the sinusoid we want to detect began its "wave" one sample before our window? What if the frequency stopped halfway through our window? We could have a frequency appear in the middle of our samples and end before our sample window ends! The fact that we have to provide a sample window, and the frequencies we measure are smaller in width than our window means that we cannot be sure when the frequency began or ended by calling the Fourier Transform. As a result, the algorithm simple looks at all of the samples and does its best to give us an estimated mangitude in the window we provide for a set of frequencies.
The best we can do is approximate the amplitude of a set of frequencies in the sample window we provide to the algorithm. In this sense, the Fourier transform could be likened to a statistical measurement of the presence of frequencies in the input samples. The mathematics of this is beyond this post, but this should give you a good idea why the output of the Fourier transform cannot just say with precision that your input sample window has a magnitude of 255.0 for the frequency 50 hertz. It can only tell us the approximated amplitude of a frequency near 50 hertz.
The Fast Fourier Transform is a special Fourier algorithm that optimizes the time it takes to run the Fourier algorithm.
In my opinion the Fast Fourier Transform is probably one of the most influential and useful algorithms every developed for computers. The reason is that without it reducing the time it takes to perform the algorithm, most computers would be heavily bogged down in their ability to do basic analysis of audio data and many of their lossy compression algorithms (yes the FFT and similar wavelet algorithms are used for lossy compression too).
There are couple of implementations of the Fast Fourier Transform but the one often used is the Cooley-Tukey algorithm. The time it takes to perform is O(N log N) where N is the number of samples in our sample window.
const samples = [0, 1, 2, 3, 2, 3, 2, 1, 0...]
const amplitudeBins = cooleyTukeyFFT(samples);
The number of amplitude bins and their frequencies is determined by the number of samples we provide and how often we measure our samples. Because the FFT is a recursive algorithm that repeatedly divides the samples in half, the lowest frequency range we can measure is going to be half of the sample window width. If our sample window is 1 full second of recording, then the lowest frequency we can measure is going to be 0.5 hertz, or half of our window length. The number of output frequency bins is going to be half of our sample window length. If we provide 512 samples, then we will have an output of 256 frequency bins. The maximum frequency we can measure is going to be determined by how small the gap is between each sample. If the gap between each sample is 1/1000 of a second, then the highest frequency we can measure is going to be half of that - or 500 hertz.
In other words, we are limited in the frequencies we can measure by the number of samples we provide and the time gap between each sample, If you want higher resolution in your frequency/amplitude results, you need to measure samples more often and provide a larger sample window to the FFT. Obviously there are limits here depending on the measuring instrument used and the CPU and memory you have.
Example with Audio Data
Imagine we have the following:
const hertz = 44100; // typical for wave files
const audioSamples = [0, 233, 12, 23, ...]; // Let's say we provide 2048 samples in this array.
const sampleRange = [0, 255]; // let's imagine each sample is 1 byte of unsigned data (but let's be fair, this is javascript...)
const bins = fft(audioSamples);
In this case our input hertz is 44.1khz, so we sample the sound pressure 44,100 times per second. This means that the time gap between each sample is going to be 1/44100 seconds.
We can predict what our bins should look like. Each bin will contain a single number, which is the amplitude (or energy) level estimated for that frequency. The bins will look like:
number_of_bins = (sample_length / 2) = 2048 / 2 = 1024
lowest_frequency_bin = (sample_length / 2) * time_gap_between_samples = (2048 / 2) * (1/44100) = 0.023 seconds = 43.5 hertz
highest_frequency_bin = (time_gap_between_samples * 2) = (1/44100) * 2 = 0.00004535147 seconds = 22050 hertz
We have 1024 bins returned, the lowest of which will be at 43.5 hertz. The highest will be 22050 hertz. The intermediate bins will be evenly distributed between the lowest and the highest bins.
At this point it is a trivial exercise to calculate the frequencies of the bins because we now know the lowest and the highest bin frequencies. However, this means that our intermediate bins will contain fixed, but strangely numbered, frequencies. If we want to find the frequency for a bin not listed in our results, but the best we can do is pick the bin that is closest to the frequency we want to measure and use that as our value. Additionally, we could do further averaging between bins as well, but that is beyond the scope of this post.
Example Code
The original library that I wrote for the Fast Fourier Transform can be found here on NPM or here on Github.
The source code in Javascript can be found here.
Using the above library, we can do something like this:
var fft = require('fft-js').fft,
fftUtil = require('fft-js').util,
signal = [1,0,1,0];
var phasors= fft(signal);
var frequencies = fftUtil.fftFreq(phasors, 8000), // Sample rate and coef is just used for length, and frequency step
magnitudes = fftUtil.fftMag(phasors);
var both = frequencies.map(function (f, ix) {
return {frequency: f, magnitude: magnitudes[ix]};
});
console.log(both);
Notice that when calculating the frequencies, we have to provide the sample rate (in samples per second). In this case we tell the library that our sample rate is 8000 samples per second. As a result, the algorithm now has the data it needs to determine the frequencies of all of the bins in our result.
Conclusion
The Fast Fourier Transform at first appears incredibly complex, but broken down it is not so bad. The results might appear a little confusing but with a little background knowledge about the mathematics you can get your head wrapped around in little time.