In Chapter 6 we found that there is little conceptual difference between the Fourier analysis of discrete and continuous functions over a finite interval. Both types of functions have discrete spectra, the only difference being that discrete functions have a finite number of harmonic components (Case 1 in Fig. 5.1), whereas a continuous function, in principle, has an infinite number of harmonics (Case 3 in Fig. 5.1). However, a curious result emerges from the homework set #7. In one problem (7.1), it is found that the spectrum of a continuous function and the sampled version of that function are significantly different, whereas in the other problem (7.2) the two spectra are exactly the same over the range of harmonics for which a comparison is possible. This indicates that under certain circumstances a function is completely determined by a finite number of samples. Exploring the conditions for which this result obtains is the topic of this chapter.
In general, a continuous function defined over a finite interval can be represented by a Fourier series with an infinite number of terms as shown in Chapter 6
[6.1] |
However, it may happen that for some functions the Fourier coefficients are all zero for all frequencies greater than some value W. For such spectra, we may think of W as the bandwidth of the spectrum. Notice a subtle difference between previous usage and this use of the term bandwidth. When discussing discrete functions, which have finite spectra, we used the term bandwidth to mean the highest frequency defined in the spectrum. In that case, W=D/2L. That definition implies that all continuous functions have infinite bandwidth because D = ∞. An alternative definition allows continuous functions to have finite bandwidth if the magnitude of all Fourier coefficients are zero beyond some limiting frequency W. This is a much more useful definition for practical purposes and so that is how the term bandwidth is normally used.
Consider now the consequences of a continuous function having finite bandwidth. An example is shown in Fig. 7.1 in which the frequency spectrum is non-zero only for the constant term and the first three harmonics. Now imagine trying to "reconstruct" the space/time function from this spectrum. We have two options in this regard. First, we could pretend that all frequencies higher than W don't exist, in which case the IDFT operation will produce a discrete function shown by the solid points in the left side of Fig. 7.1 On the other hand, we could use the entire frequency spectrum which is defined for an infinite number of harmonics and in this case the Fourier series will produce the continuous function shown in the figure. Now we assert that the continuous reconstructed function must pass through the discrete points shown. This is true because although the higher harmonics have been admitted, they make no contribution since all of the coefficients are zero. Therefore, it doesn't matter in the reconstruction whether harmonics greater than W are included or not. The only difference is that when an infinite number of coefficients are defined, and given the value zero, then the original function can be evaluated legitimately at every x-value instead of just the x-values which correspond to multiples of Δx=1/2W.
We can put the preceding argument on a quantitative footing by letting f(xj) be the discrete function defined by
| |
[7.1] |
and by letting g(x) be the continuous function defined by
[7.2] |
Since the right hand sides of eqns. [7.1] and [7.2] are equal, it follows that f(x)=g(x) for the sample points xj. For all other x-values, g(x) interpolates f(x).
The preceding discussion has shown that the continuous function g(x) is a reasonable interpolation of the discrete function f(x). The skeptical student may still be unconvinced that it is the correct interpolation. To be convinced, suppose we start with the continuous function g(x) which we know in advance is band limited to W. Given this information, we then sample g(x) to produce the discrete function f(x). The above arguments have demonstrated that provided we sample at a rate R=2W, then the spectrum of f(x) will be identical to the spectrum of g(x) for all frequencies less than W. Therefore, the spectrum of g(x) can be derived from the computed spectrum of f(x) simply by adding an infinite number of higher harmonics, all with amplitude zero. Since this synthesized spectrum will reconstruct g(x) exactly, we conclude that
All of the information necessary to reconstruct a band limited function exactly over a finite interval is contained in a finite number of samples. The reconstruction will be without error provided (1) the sampling rate R exceeds 2W, where W is the bandwidth of the given function, and (2) the sampling process is without error.
This is the famous sampling theorem of Whittaker (1935), who developed the theorem in the context of interpolation theory, and Shannon (1949) who developed the theorem in the context of information theory. It is an historical oversight that Bergmann (1858) discovered this idea in the context of neural sampling of the retinal image, which was later popularized by Helmholtz (1867) in his much quoted rule that visual resolution requires at least one relatively unstimulated neuron between two relatively unstimulated neurons (i.e. at least 2 neural samples per cycle of a visual pattern). This rule was then rediscovered by Nyquist, a communications engineer at Bell Telephone Laboratory in the 1930's.
If the required condition of the sampling theorem that R>2W is not met, then errors will occur in the reconstruction. When such errors arise due to undersampling, aliasing is said to occur. The word aliasing is used in this context because a high-frequency component is mistaken for, or masquerades as, a low-frequency component when the sampling rate is too low. An example is shown in Fig. 7.2 for a case of i=4 samples over the interval. Thus the highest frequency which would be adequately sampled according to the sampling theorem is ΔfD/2 which in this example is 2Δf. This critical frequency is called the Nyquist frequency and is denoted fN in the Figure. Since the solid curve has a frequency below the critical frequency, it satisfies the sampling theorem requirement and can be faithfully reconstructed from the frequency spectrum.
However, the dashed curve has a frequency higher than the Nyquist frequency and thus is undersampled. The spectrum for the undersampled dashed curve will be the same as the spectrum for the solid curve since the two functions are equal at the sample points. Thus, the dashed curve will appear to have a spectrum shown by the open circle marked "alias". Although we are not yet in a position to prove the claim, it turns out that the spectrum of an undersampled function can be predicted from the true spectrum by reflecting the true spectrum about the critical Nyquist frequency. For this reason, the Nyquist frequency is sometimes called the "folding frequency".
In Chapter 3 we introduced Parseval's Theorem for discrete functions. It was promised that in future chapters we would see how to interpret Parseval's theorem as a kind of energy-conservation theorem which says that a signal contains a given amount of energy regardless of whether that energy is computed in the space/time domain or in the Fourier/frequency domain. We are now in a position to take up that challenge.
In Chapter 5 we noted that the inner product of a continuous function with itself has an important interpretation in many physical situations as the total amount of energy in a signal
| |
[5.7] |
If we substitute for v(t) the corresponding Fourier series, and choose a convenient interval of observation (0, 2π) to simplify notation, we obtain the infinite series
![]() |
[7.3] |
Now, because of orthogonality, this infinite series of integrals, each of which has an infinite number of terms in the integrand, telescopes down to a manageable result.
![]() |
[7.4] |
We found earlier (eqn. [5.5]) that the integral of sin2x or cos2x over the interval (0, 2π) is equal to π. Therefore, eqn. [7.4] simplifies to
| |
[7.5] |
Combining eqns. [5.7] and [7.5] we get Parseval's theorem for continuous functions.
![]() |
[7.6] |
Thus Parseval's theorem is telling us that total power equals the square of the mean plus one half of the sum of the squared amplitudes of the sinusoidal Fourier components, which is the same as half the squared length of the vector of Fourier coefficients. But half the squared amplitude of any given Fourier component is just the power in that component. Thus the total power is the sum of the powers in each Fourier component. Frequently the mean (DC) term is not of interest because information is being carried solely by the variation of a signal about the mean (e.g. spatial or temporal contrast of a visual stimulus). In this case we would say that the signal power equals half the sum of the squared amplitudes of the Fourier components.
One practical use of Parseval's theorem is to assess the impact of truncating an infinite Fourier series. Although an infinite number of Fourier coefficients are required in theory to reconstruct a continuous function, in real life we can only hope to include a finite number of terms. According to Parseval's theorem, truncating the series removes some of the energy from the signal. This suggests that one way of assessing the impact of truncation is to calculate the fraction of the total amount of energy which is deleted by truncation. If the amount of energy being thrown away is a negligible fraction of the total, then one may argue that truncation of the series will have negligible effect.
To pursue a slightly different line of reasoning, suppose we are given a function f(x) defined over the interval (-π,π). We know that f(x) is exactly equal to the infinite Fourier series
| |
[7.7] |
However, we wish to approximate f(x) by a finite Fourier series band limited to the M-th harmonic. That is, we seek to approximate f(x) by the truncated Fourier series g(x)
| |
[7.8] |
In the statistical theory of regression, the "goodness of fit" of an approximation is often measured by the "mean squared error". By this metric, the error introduced by the above approximation can be quantified by the mean squared error e defined as
![]() |
[7.9] |
By Parseval's theorem, we interpret the first integral in [7.9] as the power in the data function and the third integral is the power in the Fourier series model. The middle term is the inner product of the model with the given function.
The goal now is to discover what values of the Fourier coefficients αk, βk will minimize ε. To see how the answer is going to turn out, consider the simplest case of M=1. That is, we wish to approximate f(x) by the function g(x)=α0 / 2. In this case, the error is
![]() |
[7.10] |
Adopting the standard approach to minimization problems of this kind, to minimize the error we must differentiate this quadratic eqn. [7.9] with respect to α0 , set the result to zero, and solve for α0. This yields
![]() |
[7.11] |
In words, this result shows that if we approximate f(x) by the function g(x)=constant, the constant which gives the best fit is just the first Fourier coefficient of f(x). In other words, the truncated Fourier series is also the "least squares estimate" of the function f(x).
Before accepting the above conclusion as being generally true, let us consider approximating f(x) by a 3-term Fourier series
| |
[7.12] |
Again we investigate the mean squared error
![]() |
[7.13] |
The middle term generates three inner products between the given signal and each of the three components of the model. But these inner products are recognized as the definition of Fourier coefficients. Thus the error reduces to
![]() |
[7.14] |
Now we minimize this error first with respect to α0,
![]() |
[7.15] |
then with respect to α1,
![]() |
[7.16] |
and then with respect to β1.
![]() |
[7.17] |
Again we find that the truncated Fourier series is also the model which minimizes the mean squared error. Given these specific examples, we assert without further proof that the truncated Fourier series of a function is always the least squares Fourier model for that function. The same holds true for a partial Fourier series (i.e. a series that is missing some terms) since the proof is essentially term-by-term.
Note that the inclusion of fundamental harmonics did not alter the value of the best constant determined in the first example. This result is due to the orthogonality of the basis functions in the Fourier series. Thus it is true in general that the best Fourier coefficient calculated for the k-th harmonic is independent of the calculation of any other coefficient. Other methods of fitting a function with a series of terms, for example a polynomial or a Taylor series, do not have this nice property. Thus, when one models data with a polynomial or a Taylor series, the coefficients obtained depend upon the number of terms included in the series. This difficulty does not arise with a Fourier series model.