This session introduces the fast fourier transform (FFT) which is one of the most widely used numerical algorithms in the world. It exploits some features of the symmetry of the computation of the DFT to reduce the complexity to something that takes order $N^2$ ($O(N^2)$) complex operations to something that takes order $N log N$ ($O(N \log N)$) operations.
The material in this presentation and notes is based on Chapter 10 of Steven T. Karris, Signals and Systems: with Matlab Computation and Simulink Modelling, 5th Edition from the Required Reading List.
Consider a signal whose highest frequency is 18 kHz, the sampling frequency is 50 kHz, and 1024 samples are taken, i.e., $N=1024$.
The time required to compute the entire DFT would be:
$$t = \frac{1024\,\mathrm{samples}}{50\times 10^3\,\mathrm{samples\ per\ second}} = 20.48\,\mathrm{ms}$$
To compute the number of operations required to complete this task, let us expand the N-point DFT defined as:
$$X[m] = \sum_{n=0}^{N-1} x[n] W_N^{mn}$$
Then $$\begin{eqnarray*} X[0] &=& x[0]W_N^0 + x[1]W_N^0 + x[1]W_N^0 + \cdots + x[N - 1]W_N^0 \hfill \\ X[1] &=& x[0]W_N^0 + x[1]W_N^1 + x[1]W_N^2 + \cdots + x[N - 1]W_N^{N - 1} \hfill \\ X[2] &=& x[0]W_N^0 + x[1]W_N^2 + x[1]W_N^4 + \cdots + x[N - 1]W_N^{2(N - 1)} \hfill \\ \hfil &\cdots& \hfill \\ X[N - 1] &=& x[0]W_N^0 + x[1]W_N^{N - 1} + x[1]W_N^{2(N - 1)} + \cdots + x[N - 1]W_N^{{{(N - 1)}^2}} \hfill \\ \end{eqnarray*}$$
This is possible with the algorithm called the FTT (fast Fourier transform) that was originally developed by James Cooley and John Tukey and considerably refined since.
The FFT is very well documented, including in the text book, so we will only sketch its development and present its main result.
Much of the development follows from the properties of the rotating vector.4
$$W_N=e^{-\frac{j2\pi}{N}}$$
which results in some simplifications and mathematical short-cuts when $N$ is a power of 2.
The most useful properties are:
$$\begin{eqnarray*}W_N^N &=& e^{-j\frac{2\pi}{N}N} = e^{-j2\pi} = 1.\\ W_N^{N/2} &=& e^{-j\frac{2\pi}{N}\frac{N}{2}} = e^{-j\pi} = -1.\\ W_N^{N/4} &=& e^{-j\frac{2\pi}{N}\frac{N}{4}} = e^{-j\pi/2} = -j.\\ W_N^{3N/4} &=& e^{-j\frac{2\pi}{N}\frac{3N}{4}} = e^{-j3\pi/2} = j.\\ W_N^{kN} &=& e^{-j\frac{2\pi}{N}kN} = e^{-j2\pi} = 1,\,k=0,1,2,\ldots\\ W_N^{kN+r} &=& e^{-j\frac{2\pi}{N}kN}e^{-j\frac{2\pi}{N}r} = 1.W_N^r=W_N^r.\\ W_{2N}^{k} &=& e^{-j\frac{2\pi}{2N}k} = e^{-j\frac{2\pi}{N}\frac{k}{2}} = W_N^{k/2}. \end{eqnarray*}$$
Representing
$$\begin{eqnarray*} X[0] &=& x[0]W_N^0 + x[1]W_N^0 + x[1]W_N^0 + \cdots + x[N - 1]W_N^0 \hfill \\ X[1] &=& x[0]W_N^0 + x[1]W_N^1 + x[1]W_N^2 + \cdots + x[N - 1]W_N^{N - 1} \hfill \\ X[2] &=& x[0]W_N^0 + x[1]W_N^2 + x[1]W_N^4 + \cdots + x[N - 1]W_N^{2(N - 1)} \hfill \\ \hfil &\cdots& \hfill \\ X[N - 1] &=& x[0]W_N^0 + x[1]W_N^{N - 1} + x[1]W_N^{2(N - 1)} + \cdots + x[N - 1]W_N^{{{(N - 1)}^2}} \hfill \\ \end{eqnarray*}$$
in matrix form: $$ \left[ {\begin{array}{*{20}{c}} {X[0]}\\ {X[1]}\\ {X[2]}\\ \vdots \\ {X[N - 1]}\end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {W_N^0}&{W_N^0}&{W_N^0}&\cdots&{W_N^0}\\ {W_N^0}&{W_N^1}&{W_N^2}&\cdots&{W_N^{N - 1}}\\ {W_N^0}&{W_N^2}&{W_N^4}&\cdots&{W_N^{2(N - 1)}}\\ \cdots&\cdots&\cdots&\cdots&\cdots\\ {W_N^0}&{W_N^{N - 1}}&{W_N^{2(N - 1)}}&\cdots&{W_N^{{{(N - 1)}^2}}}\\ \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {x[0]}\\ {x[1]}\\ {x[2]}\\ \vdots \\ {x[N - 1]} \end{array}} \right].$$
This is a complex Vandemonde matrix and it is more compactly expressed as:
$$\mathbf{X}[m] = \mathbf{W_N} \mathbf{x}[n]$$
The algorithm developed by Cooley and Tukey is based on matrix decomposition methods, where the matrix $\mathbf{W_N}$ is factored into $L$ smaller matrices, that is: $$\mathbf{W_N} = \mathbf{W_1} \mathbf{W_2} \mathbf{W_3} \cdots \mathbf{W_L}$$ where $L$ is chosen as $L = \log_2N$ or $N=2^L$.
Each row of the matrices on the right side of the decomposition, contains only two, non-zero terms, unity and $W_N^k$. Then the vector $$\mathbf{X}[m] = \mathbf{W_1} \mathbf{W_2} \mathbf{W_3} \cdots \mathbf{W_L} \mathbf{x}[n].$$
The FFT computation starts with matrix $\mathbf{W}_L$. It operates on $\mathbf{x}[n]$ producing a row vector, and each component of the row vector is obtained by one multiplication and one addition. This is because there are only two non-zero elements on a given row, and one of those elements is unity. Since there are $N$ components of $\mathbf{x}[n]$, there will be $N$ complex multiplications and $N$ complex additions.
This new vector is then operated on by the $\mathbf{W}_{L-1}$ matrix, then on $\mathbf{W}_{L-1}$ and so on, until the entire operation is completed.
It appears that the entire operation would require $NL= N\log_2N$ complex additions and also $N\log_2N$ complex additions.
However, since $W_N^0 = 1$, $W_N^{N/2}=-1$, and other simplifications, it is estimated that only about half of these, that is, $N\log_2 N$ total arithmetic operations are required by the FFT versus the $N^2$ required by the DFT5.
Under the assumptions about the relative efficiency of the DFT and FFT we can create a table like that shown below:
DFT | FFT | FFT/DFT | |
---|---|---|---|
N | $N^2$ | $N\log_2 N$ | % |
8 | 64 | 24 | 37.5 |
16 | 256 | 64 | 25 |
32 | 1024 | 160 | 15.6 |
64 | 4096 | 384 | 9.4 |
128 | $16,384$ | 896 | 5.5 |
256 | $65,536$ | 2048 | 3.1 |
512 | $261,144$ | 4608 | 1.8 |
1024 | $1,048,576$ | $10,240$ | 1 |
2048 | $4,194,304$ | $22,528$ | 0.5 |
As you can see, the efficiy of the FFT actual gets better as the number of samples go up!
The FFT algorithm is implemented, in MATLAB, as the function fft
. We will conclude by working through Exercises 6 and 7 from section 10.8 of Karris.
Plot the Fourier transform of the rectangular pulse shown below, using the MATLAB fft
func-tion. Then, use the ifft
function to verify that the inverse transformation produces the rectangular pulse.
The rectangular pulse can be produced like so
x = [linspace(-2,-1,50) linspace(-1,1,100) linspace(1,2,50)];
y = [linspace(0,0,50) linspace(1,1,100) linspace(0,0,50)];
plot(x,y)
and the FFT is produced as
plot(x, abs(fft(y)))
unwind
plot(x, abs(fftshift(fft(y))))
The inverse FFT is obtained with
plot(x, ifft(fft(y)))
The triangular pulse is obtained with
x = linspace(-1,1,100);
y = [linspace(0,1,50) linspace(1,0,50)];
plot(x,y)
and the FFT is obtained with
plot(x, abs(fftshift(fft(y))))
The inverse FFT is obtained with
plot(x, ifft(fft(y)))
Read the rest of Chapter 10 of Karris from page 10.9 and make your own notes on the implementation of the FFT.