In his historical introduction to the classic text Theory of Fourier's Series and Integrals, Carslaw (1921) recounts the controversy which raged in the middle of the eighteenth century over the question of whether an arbitrary function can be represented as the sum of a series of weighted sinusoids. Many of Europe's most prominent mathematicians participated in the debate, including Euler, D'Alembert, Bernoulli, Lagrange, Dirichlet, and Poisson. At that time, it was recognized that periodic functions, like the vibration of a string, could be represented by a trigonometrical series. However, it was not at all clear that a non-periodic function could be so represented, especially if it required an unlimited number of sines and cosines which might not necessarily converge to a stable sum. Fourier, who is credited with resolving this issue, was interested in the theory of heat. One practical problem of the day was to understand the temperature profile of the earth at various depths beneath the surface. A typical profile Y=f(X) might look like the heavy curve illustrated in Fig. 3.1
Although the temperature profile in this example is defined only from the earth's surface (X=0) to a depth X=L, suppose we replicate the given curve many times and connect the segments end-to-end as shown by the light curves in the figure. The result would be a periodic function of period L which can be made to stretch as far as we like in both the positive-X and negative-X directions . This trick is intended to raise our expectations of success in representing such a function by a sum of sinusoids, which also exist over the entire length of the X-axis. Obviously the sum of sinusoids will fit the original function over the interval (0-L) if it fits the periodic function over the entire X-axis. The figure also makes it clear that the period of each sinusoidal component must be some multiple of L because otherwise some intervals will be different from others, which is incompatible with the idea of periodicity. The name given to the harmonically related series of sinusoids required to reproduce the function Y=f(X) exactly is the Fourier series.
Before tackling the more difficult problem posed above, finding the Fourier series for a continuous function defined over a finite interval, we will begin with the easier problem of determining the Fourier series when the given function consists of a discrete series of Y-values spaced equally along the X-axis. This problem is not only easier but also of more practical interest to the experimentalist who needs to analyze a series of discrete measurements of some physical variable, or perhaps uses a computer to sample a continuous process at regular intervals.
Consider the extreme case where a function Y(X) is known for only one value of X. For example, in the study of heat illustrated in Fig. 3.1, suppose we have a raw data set consisting of a single measurement Y0 taken at the position X=0. To experience the flavor of more challenging cases to come, imagine modeling this single measurement by the function
| Y = f ( X ) = m | [3.1] |
where m is an unknown parameter for which we need to determine a numerical value. The obvious choice is to let m=Y0. Graphically, the problem is illustrated in Fig. 3.2 and the solution is the heavy, horizontal line passing through the datum point. This horizontal line is the Fourier series for this case of D=1 samples and the parameter m is called a Fourier coefficient. Although this is a rather trivial example of Fourier analysis, it illustrates the first step in a more general process which we will be developing.
If we engage in the process of replicating the data set in order to produce a periodic function, the same line passes through all of the replicated data points as expected. Notice, however, that the fitted line is an exceedingly poor representation of the "true" curve. The reason, of course, is that we were provided with only a single measurement; to achieve a better fit to the underlying function we need more samples.
Suppose next that measurements Y0 and Y1 are taken at two depths, X=0 and X=L/2, respectively, as illustrated in Fig. 3.3. These values of X correspond to the beginnings of two sub-intervals obtained by dividing the total interval L into D=2 parts of equal length. Now that we have a second measurement available, we are able to add a second term to our model. Thus we seek to fit the data with the Fourier series
| [3.2] |
To determine the unknown Fourier coefficients m and a, we evaluate equation [3.2] at the two X-values for which we have measurements. The result is the pair of equations
| [3.3] |
The solution to this pair of equations is
![]() |
[3.4] |
Therefore, we conclude that the model
| [3.5] |
will pass through the original two data points exactly, as shown in Fig. 3.3.
Notice that the cosine term in this model undergoes one complete cycle in the interval (0-L). In other words, the frequency of the cosine term is the same as the frequency of the periodic waveform of the underlying function. For this reason, this cosine is said be the fundamental harmonic term in the Fourier series. This new model of a constant plus a cosine function is clearly a better match to the underlying function than was the previous model, which had only the constant term. On the other hand, it is important to remember that although the mathematical model exists over the entire X-axis, it only applies to the physical problem over the interval of observation (0-L) and within that interval the model strictly applies only to the two points actually measured. Outside the interval the model is absurd since there is every reason to expect that the physical profile is not periodic. Even within the interval the model may not make sense for X-values in between the actual data points. To judge the usefulness of the model as an interpolation function we must appeal to our understanding of the physical system under study. In the next chapter we will expand this example to include D=3 samples, but before we do that we need to look more deeply into the process of Fourier analysis which we have just introduced.
As the preceding example has demonstrated, Fourier analysis of discrete functions yields a model which consists of a weighted sum of sinusoids plus a constant. These weights, or Fourier coefficients, are unknown parameters of the model which must be determined from the data. The computational method used to produce the values of the coefficients in the above example was to solve two linear equations in two unknowns. This suggests that we write equations [3.3] in matrix form as follows
| [3.6] |
When written this way, Fourier analysis looks like the kind of linear transformation described in Chapter 2. That is, if we think of our two sample points Y0 and Y1 as the components of a 2-dimensional data vector v, then v is evidently a transformation of some other vector f which is comprised of the Fourier coefficients m and a. In matrix/vector notation, the claim is that v=M•f. The inverse transformation, f=M-1v, indicates that the vector of Fourier coefficients f is a linear transformation of the given data vector v. To show this, first we find the inverse of matrix M and then use the result to write
| [3.7] |
which of course is equivalent to equations [3.4]. (The common factor 1/2 has been extracted from each element of M-1 in this equation.) In the language of Fourier analysis, equation [3.7] describes the Discrete Fourier Transform (DFT) of data vector v into the vector of Fourier coefficients f. Conversely, equation [3.6] describes the Inverse DFT of f to reproduce the data vector v.
Another way of viewing Fourier analysis suggests itself if we rewrite equation [3.6] as a vector sum
![]() |
[3.8] |
where C0 is the constant function cos(0*2πx/L) evaluated at the given sample points and C1. is the fundamental harmonic function cos(1*2πx/L) evaluated at the given sample points,
| [3.9] |
In words, these equations say that data vector v is the weighted sum of two other vectors C0 and C1. Notice that vectors C0 and C1 are orthogonal and so they could serve as alternative basis vectors for the 2-dimensional space used to represent v. This interpretation is illustrated in Fig. 3.4. The advantage of viewing the problem this way is that it provides a geometrical interpretation of the calculation of the unknown parameters m and a. These parameters represent the components of v in the directions of basis vectors C0 and C1. From Chapter 2 we know that these components are found by projecting v onto the basis vectors and the lengths of these projections may be computed by the inner product. Accordingly, we can compute m by evaluating the inner product v•C0 as follows
![]() |
[3.10] |
Notice that the second term in [3.10] drops out because of orthogonality. Rearranging terms, we get
![]() |
[3.11] |
Similarly, we can compute a by evaluating the inner product v•C1 as follows
![]() |
[3.12] |
Again one term drops out because of orthogonality. Rearranging terms, we get
![]() |
[3.13] |
The repeated formation of an inner product of the data vector v with vectors C0 and C1 suggests the (somewhat unconventional) matrix notation
| [3.14] |
which is the same as equation [3.7]. In
this equation,
indicates
a row vector containing samples of the cosine waveform.
The conclusion to be drawn from this discussion is that if we represent a series of two samples of a function as a data vector v, then we can perform Fourier analysis on the sampled function by creating two orthogonal vectors C0 and C1 and projecting the data vector onto these new basis vectors. The lengths of these projections may be interpreted as the amount of C0 or C1 present in the data vector. Since C0 is just the sampled constant function and C1 is the sampled cosine function, these projections tell us how much constant and how much cosine is present in the data. That is what Fourier analysis is all about! In the next section we will expand this line of reasoning to deal with the case of D=3 samples and then generalize the result for an arbitrary number of samples.
Let us return now to the problem of measuring the temperature function of Fig. 3.2. We have seen that if we sample the temperature at two points, then a Fourier series with two terms fits the two measurements exactly. To generalize this observation, we might expect three samples will be fit exactly by a 3-term Fourier series, that four samples will be fit by a 4-term series, and so on. The obvious question of "how many terms are enough?" will be dealt with in later chapters. For the moment, we can say two things. First, to fit a continuous function perfectly with a Fourier model will require an infinite number of terms. Second, we might imagine that if enough points are sampled, i.e. if samples are sufficiently close together, then errors made by interpolating between points with a Fourier model will not be wrong by too much. To sharpen this imprecise statement further we will need to define criteria for what is meant by "enough points" and when a model is "wrong by too much". Then, to judge whether a particular model satisfies these criteria we must ultimately depend on our understanding of the physics of the problem. But before we can tackle these important issues we need to develop general formulas for Fourier series which work for any number of sample points. We will be in a position to do this after we consider one more specific example, that of D=3.
In order to obtain a better model of the underlying function in Fig. 3.2, suppose that measurements Y0, Y1, and Y2 are taken at three depths, X=0, X=L/3, and X=2L/3, respectively, as illustrated in Fig. 3.5. These values of X correspond to the beginnings of three sub-intervals obtained by dividing the total interval L into D=3 parts of equal length.
Now that we have a third measurement available, we are able to add a third term to our trigonometrical model. Thus we seek to fit the data with the Fourier series
| [3.15] |
To determine the three unknown Fourier coefficients m, a and b, we evaluate equation [3.15] at the three X-values for which we have measurements. The result is the system of 3 linear equations
![]() |
[3.16] |
To solve this system of equations we first write them in the matrix form v=M•f as follows
![]() |
[3.17] |
When written this way, the columns of matrix M are immediately recognized as the column vectors C0, C1, and S1 which are the sampled trigonometric functions which form the basis of our Fourier model. This suggests the more compact notation (using arrows to indicate column vectors in the matrix)
| [3.18] |
from which it follows that vector v is the weighted sum of the basis vectors
| v = mC0 + aC1 + bS1 | [3.19] |
In other words, if we imagine forming a 3-dimensional data vector from the 3 given Y-values, then this vector exists in a 3-dimensional space which is also spanned by an alternative set of basis vectors - the sampled trigonometric functions.
![]() |
Fig. 3.6 Vector Representation of Discrete
Function of 3 Points
|
Thus our data vector may be conceived as the weighted sum of these basis vectors and the corresponding weights are the unknown Fourier coefficients we desire. This geometrical perspective on the problem suggests that to determine the Fourier coefficients we should project the data vector onto each of the basis vectors in turn by forming the inner product. For example, projecting v onto C0 gives
![]() |
[3.20] |
Note that all but one term vanishes because of orthogonality. From exercise 3.3 we know that the squared length of C0 is D, the dimensionality of the problem (D=3 in this example). Thus the first Fourier coefficient is
| [3.21] |
By a similar line of reasoning we obtain the other two Fourier coefficients
![]() |
[3.22] |
These last two equations may be combined to give the matrix form (using arrows to denote row vectors in the matrix)
![]() |
[3.23] |
which can be simplified by extracting the common factor 2/D to give
![]() |
[3.24] |
In summary, equations [3.16] through [3.19] are different ways of expressing the inverse DFT for the case of D=3. As such, they tell us how to "reconstruct" the measured data points from the Fourier coefficients. But we needed to solve the opposite problem: given the measured values, determine the unknown Fourier coefficients. In other words, we sought the forward DFT, f=M-1v, which means we had to find the inverse of matrix M. Our method of solution took advantage of the orthogonality of the trigonometrical basis vectors by repeatedly forming the inner product of the data vector with each of the trigonometrical basis vectors. When each inner product is divided by the length of the corresponding basis function, the result is interpreted geometrically as the length of the projection of v onto the basis vectors. These lengths equal the Fourier coefficients in the model.
The preceding examples of D=1, 2, and 3 point the way towards a general method for determining the Fourier coefficients for a trigonometrical model suitable for an arbitrary number of points. In section 3.A we reasoned that the periods of all of the trigonometric elements in the model must be some integer fraction of the observation interval. That is, the period must equal L/k where k is an integer which we will call the harmonic number. Thus, the model we seek will be a Fourier series of the form
| [3.25] |
which is perhaps easier to grasp if we simplify the notation a bit by making a change of variables
| [3.26] |
where D = 2N when D is even, and D=2N+1 when D is odd. As before, we assume the X-interval from 0 to L is subdivided into D equal parts and the Y-values occur at the beginnings of each of these sub-intervals so that θj = 2πj / D. If the interval runs from S to S+L, then the values of θ are θj = 2π(S + jL / D) / L.
For D data points there are D unknown Fourier coefficients to be determined. If equation [3.26] is evaluated at every X-value for which we have measurements, then the result is a system of D linear equations which, when written in matrix form, are
![]() |
[3.27] |
which is compactly written as an inner product of f with a matrix of column vectors
| [3.28] |
Equations [3.27] and [3.28] shows the inverse discrete Fourier transform (IDFT) when D is odd. When D is even, the last column (SN) in the matrix and the last Fourier coefficient (bN) are omitted.
To obtain the forward DFT, we conceive of our D data points as a vector in D-dimensional space which may be expressed as the weighted sum of trigonometrical basis vectors. The corresponding weights are the unknown Fourier coefficients which are obtained by projecting the data vector onto each of the basis vectors in turn by forming the inner product. For example, projecting v onto Ck gives
![]() |
[3.29] |
Note that all but one term in the expanded inner product vanishes because of orthogonality. Recall from exercise 3.3 that the squared length of Ck equals D/2, so the k-th cosine Fourier coefficient is given by the simple formula
![]() |
[3.30] |
A corresponding formula holds for the k-th sine coefficient
![]() |
[3.31] |
These last two equations thus define every Fourier coefficient except for two special cases which do not include the factor 2. The reason this factor is missing is because the squared length of the corresponding basis vector is D rather than D/2. One special case is the first Fourier coefficient, m, which is the mean value and the other special case is the last Fourier coefficient, CN, when D is even.
Notice that these results demonstrate that it is possible to calculate any particular Fourier coefficient without regard to the other Fourier coefficients. This is a consequence of the orthogonality of the sampled sinusoids. One practical implication of this result is that it is not necessary to calculate all of the Fourier coefficients of the model if only some are of interest. On the other hand, if all of the Fourier coefficients are to be computed, then it is convenient to calculate them all at once by the matrix multiplication as follows
![]() |
[3.32] |
which can be simplified by extracting the common factor 2/D to give
![]() |
[3.33] |
Although valid, the general equations just developed have a couple of aesthetically unpleasant features which obscure an important geometrical interpretation of Fourier analysis. First, there is the need for a special formula for calculating the mean coefficient m. Equation [3.30] generates an answer which is exactly twice the mean if we set k=0. That is, a0 = 2m. This suggests that we make a change of variables in our model so that we won't need to regard the constant term as a special case. With this change of variable, the model becomes
| [3.34] |
which is the form of the Fourier series generally quoted in text books.
Next, there is the unsightly factor 2 in the top element of the matrix in equation [3.33]. This anomaly results from the fact that all of the trigonometric basis vectors have squared length D/2 except for C0, which has squared length D. This suggests that we multiply C0 in equation [3.27] by √2 / √2 and then factor out the √2 in the numerator and put it into the a0 term in the vector of Fourier coefficients as shown below in equation [3.35]
![]() |
[3.35] |
Since every column vector in the trigonometrical matrices now has length √(D/2), let us extract this common factor so that each column will have unit length. To keep the equations from looking messy, we first define the following unit basis vectors
![]() |
[3.36] |
so that equation [3.35] now becomes
![]() |
[3.37] |
which is written in our compact notation as
| [3.38] |
taking care to remember that the first element of f, the vector of Fourier coefficients, is different by the factor √2 from earlier usage.
The advantage of expressing the inverse DFT in the form of equation [3.38] is that the matrix of trigonometric vectors is orthonormal, which means that the inverse matrix is just the transpose. Therefore, we can immediately write the forward DFT by inspection:
![]() |
[3.39] |
The symmetrical pair of equations, [3.38] and [3.39] would be easily programmed on a computer.
A second advantage of this form of the DFT equations is that it reveals the essential simplicity of the underlying geometry of Fourier analysis. Since the matrix of trigonometric vectors is orthonormal, it is recognized as a rotation matrix. In other words, the data vector v is transformed into a Fourier vector f by a rotation and change of scale (represented by the constant multiplier √(2/D) in equation 3.39). Naturally the length of the data vector should be the same regardless of which frame of reference it is calculated in. This geometrical notion is formulated below as Parseval's Theorem, which can be interpreted as a statement of conservation of energy.
Having defined two sets of orthonormal basis vectors for the D-dimensional space in which the data vector v is represented, it is instructive to compute the length of v with respect to these two coordinate reference frames for comparison. We have learned previously that the inner product of v with itself yields the squared length of the vector. We must be careful in such a calculation, however, because we have seen that the two reference frames differ in scale. One way to make sure that these scale factors are accounted for is to represent the data vector as a weighted sum of unit vectors pointing in the cardinal directions of the coordinate reference frame. In other words, let
| v = Y1u1 + Y2u2 + ... + YDuD | [3.40] |
where the unit basis vector uk with respect to the first reference frame has coordinate 0 for every dimension except the k-th dimension, for which the coordinate is 1. For example,
![]() |
[3.41] |
Now forming the inner product of v with itself is easily evaluated because many terms vanish due to the orthogonality of the basis vectors, and the remaining terms contain the factor u•u, which is unity because the basis vectors are unit length. Thus we have,
![]() |
[3.42] |
which is the expected result based on the Pythagorean theorem for
D-dimensional space. Notice that equation [3.42] simplifies so dramatically because each line in the expanded form reduces to a single term due to the orthogonality of the unit basis vectors ui.
By a similar line of reasoning, we find that
| [3.43] |
Now if we repeat the above process when v is represented in the alternative coordinate reference frame defined by the sampled trigonometric functions according to eqn. [3.38], the result is
![]() |
[3.44] |
Combining the results of eqns. [3.43] and [3.44] we obtain the following identity,
| [3.45] |
which is known as Parseval's Theorem. In words, Parseval's theorem states that the length of the data vector may be computed either in the space/time domain (the first coordinate reference frame) or in the Fourier domain (the second coordinate reference frame). The significance of the theorem is that it provides a link between the two domains which is based on the squared length of the data vector. In future chapters we will see how the squared length of the vector may be interpreted in many physical situations as the amount of energy in a signal and the squared Fourier coefficients are equivalent to the amount of energy in the terms of the Fourier series model. In this sense, Parseval's theorem is an energy-conservation theorem since it says that a signal contains the same amount of energy regardless of whether that energy is computed in the space/time domain or in the Fourier/frequency domain.
Another interpretation of Parseval's theorem is in connection with statistics. If we rearrange eqn. [3.45] by moving the constant term (m=a0/2) to the left side we get
| [3.46] |
Now the left side of this equation is recognized as the variance of the Y-values in the discrete function being modeled whereas the right side of the equation is one-half of the sum of the squared amplitudes of the trigonometric Fourier coefficients. Again, since the squared amplitudes of the Fourier coefficients are associated with the energy in the model, this equation says that the variance of a set of data points may also be thought of as a measure of the amount of energy in the signal.
Other aspects of statistics also take on a geometrical perspective once data are conceived as vectors in D-space. For example, the correlation between a set of
X-values and corresponding Y-values is defined by Pearson's correlation coefficient
![]() |
[3.47] |
If we conceive of the set of X-values as a vector x, then we can "normalize" x by subtracting off the mean X-value from each component of x to produce a new vector x' as follows
![]() |
[3.48] |
If we do the same for the Y-values to produce the normalized vector y', then the numerator of eqn. [3.46] is recognized as the inner product of x' and y'. In a similar flash of insight we notice that the denominator of this equation is the product of the lengths of these two vectors. In short, we see that
![]() |
[3.49] |
In other words, the correlation between two variables in a sample of (X,Y) data values is equal to the angle between the two corresponding normalized data vectors in D-dimensional space.
A similar line of reasoning shows that the slope of the best-fitting linear regression line equals
![]() |
[3.50] |