∑ ∑

Transcription

∑ ∑
IMA Journal of Numerical Analysis (2015) Page 1 of 13
doi:10.1093/imanum/drnxxx
A fast FFT-based discrete Legendre transform
Nicholas Hale† and Alex Townsend‡
[Received on ; revised on ]
An O(N(log N)2 / loglog N) algorithm for computing the discrete Legendre transform and its inverse is
described. The algorithm combines a recently developed fast transform for converting between Legendre
and Chebyshev coefficients with a Taylor series expansion for Chebyshev polynomials about equallyspaced points in the frequency domain. Both components are based on the FFT, and as an intermediate
step we obtain an O(N log N) algorithm for evaluating a degree N − 1 Chebyshev expansion at an N-point
Legendre grid. Numerical results are given to demonstrate performance and accuracy.
Keywords: discrete Legendre transform; Legendre polynomials; Chebyshev polynomials; fast Fourier
transform
1. Introduction
Given N values c0 , . . . , cN−1 , the discrete Legendre transform (DLT) or finite Legendre transform calculates the discrete sums
N−1
fk =
∑ cn Pn (xkleg ),
0 6 k 6 N − 1,
(1.1)
n=0
leg
where Pn (x) denotes the degree n Legendre polynomial and the Legendre nodes, x0leg > . . . > xN−1
∈
(−1, 1), are the roots of PN (x). The inverse discrete Legendre transform (IDLT), which computes
c0 , . . . , cN−1 given f0 , . . . , fN−1 , can be expressed as
N−1
cn = (n + 21 )
leg
∑ wleg
k f k Pn (xk ),
0 6 n 6 N − 1,
(1.2)
k=0
leg
where wleg
0 , . . . wN−1 are the Gauss–Legendre quadrature weights. It is the goal of this paper to describe
fast algorithms for computing the DLT and IDLT.
Expansions in Legendre polynomials can be preferred over Chebyshev expansions because Legendre
polynomials are orthogonal in the standard L2 -norm. However, Legendre expansions are less convenient
in practice because fast algorithms are not as readily available or involve a precomputational cost that
inhibits applications employing adaptive discretizations. By deriving a convenient fast algorithm for the
DLT and IDLT we take some steps towards removing this barrier and allowing Legendre expansions
and adaptive discretizations at Legendre nodes to become a practical tool in computational science.
Our approach makes use of the fast Chebyshev–Legendre transform described in (Hale & Townsend,
2014), which is based on carefully exploiting an asymptotic expansion of Legendre polynomials and the
fast Fourier transform (FFT). Using the FFT has advantages and disadvantages. On the one hand, fast
† Department
‡ Department
of Applied Mathematics, University of Stellenbosch, Stellenbosch, 7600, South Africa. (nickhale@sun.ac.za)
of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139-4307.
(ajt@mit.edu)
c The author 2015. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
2 of 13
N. HALE AND A. TOWNSEND
DCT
Values at
Chebyshev
points
(Potts et al., 1998)
(Mori et al., 1999)
Legendre
coefficients
(Alpert & Rokhlin, 1991)
(Hale & Townsend, 2014)
Chebyshev
coefficients
DLT
IDLT
Values in the
complex plane
(Iserles, 2011)
(Tygert, 2010)
NDCT
Values at
Legendre
points
FIG. 1. Existing fast algorithms related to the DLT and IDLT. Solid lines depict the transforms discussed in this paper. The
“non-uniform discrete cosine transform” (NDCT) is described in Section 2, the DLT in Section 3, and the IDLT in Section 4.
and industrial-strength implementations of algorithms for computing the FFT are ubiquitous. On the
other hand, the FFT is restricted to equally-spaced samples and is therefore not immediately applicable to situations like computing the DLT. In this paper we overcome the equally-spaced restriction by
considering the Legendre nodes as a perturbation of a Chebyshev-like grid and employing a truncated
Taylor series expansion.
Examples of existing approaches that may be used to compute the DLT are butterfly schemes (O’Neil
et al., 2010) and the oversampled non-uniform discrete cosine transform (Potts, 2003). Similarly to
the algorithm in (Townsend, 2015), the algorithm presented here has two advantages over existing approaches: (1) There is essentially no precomputational cost; and, (2) The algorithmic parameters are
precisely determined by analysis. This means that the new algorithm is well-suited to applications
where N is not known in advance or when there are multiple accuracy goals. The algorithmic parameters are selected based on error analysis in such a manner so they are asymptotically optimal for large
N. Figure 1 summarizes existing fast algorithms for related transforms.1
The paper is structured as follows: In the next section we develop an O(N log N) algorithm for
evaluating a Chebyshev series expansion at Legendre nodes (we refer to this as a “non-uniform discrete
cosine transform” or NDCT). In Section 3 we combine this with the fast Chebyshev–Legendre transform
from (Hale & Townsend, 2014) to obtain an O(N(log N)2 / loglog N) algorithm for computing the DLT.
A similar algorithm for the IDLT is described in Section 4.
Throughout we use the following notation: A column vector with entries v0 , . . . , vN−1 is denoted by
v, a row vector by vT , a diagonal matrix with entries v0 , . . . , vN−1 by Dv , and the N × N matrix with ( j, k)
entry Pk (x j ) by PN (x).
2. A non-uniform discrete cosine transform
Before considering the DLT in (1.1) we describe an O(N log N) algorithm for evaluating a Chebyshev
expansion at Legendre nodes,
N−1
fk =
∑ cn Tn (xkleg ),
0 6 k 6 N − 1.
n=0
1 Figure
1 is reproduced with some modifications from (Hale & Townsend, 2014).
(2.1)
A FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM
3 of 13
This will become an important step in the DLT. Here, Tn (x) = cos(n cos−1 x) is the degree n Chebyshev
polynomial of the first kind and the sums in (2.1) may be rewritten as
N−1
fk =
∑ cn cos(nθkleg ),
0 6 k 6 N − 1,
(2.2)
n=0
where xkleg = cos(θkleg ). If the Legendre nodes in (2.2) are replaced by the Chebyshev points of the first
kind, i.e.,
!
(k + 12 )π
cheb1
cheb1
) = cos
= cos(θk
,
0 6 k 6 N − 1,
(2.3)
xk
N
then (2.1) can be computed in O(N log N) operations via a diagonally-scaled discrete cosine transform
of type III (DCT-III) (Gentleman, 1972). Since (2.2) has a similar form to a DCT, but with non-equallyspaced samples θkleg , we refer to (2.1) as a “non-uniform discrete cosine transform” (NDCT).
Our algorithm to compute the NDCT is summarized as follows: consider the transformed Legendre
leg
∗ , i.e.,
nodes, θ0leg , . . . , θN−1
, as a perturbation of an equally-spaced grid, θ0∗ , . . . , θN−1
θkleg = θk∗ + δ θk ,
0 6 k 6 N − 1,
and then approximate each cos(nθkleg ) term in (2.2) by a truncated Taylor series expansion about θk∗ . If
|δ θk | is small (see Section 2.2) then only a few terms in the Taylor series expansion are required. More∗
over, since the θ0∗ , . . . , θN−1
are equally-spaced, each term in the series can be computed in O(N log N)
leg
leg
operations via a DCT (see Section 2.3). The Legendre nodes, both x0leg , . . . , xN−1
and θ0leg , . . . , θN−1
, can
be computed in O(N) operations using the algorithm described in (Bogaert, 2014).
2.1
Computing an NDCT via a Taylor series expansion
As in (Hale & Townsend, 2014) we find it convenient to work in the θ = cos−1 x variable. There are
three reasons for this:
(1) The quantities θkleg can be computed more accurately than xkleg (Swarztrauber, 2003).
d
cos−1 x = (1 − x2 )−1/2 ,
(2) The function cos−1 x is ill-conditioned near x = ±1. In particular, since dx
leg 2
leg 2
−2
−2
1 − (x0 ) = O(N ), and 1 − (xN−1 ) = O(N ), one can expect rounding errors that grow like
leg
O(N) when evaluating cos−1 (x0leg ) and cos−1 (xN−1
). Furthermore, when evaluating TN−1 (cos−1 (x0leg ))
leg
and TN−1 (cos−1 (xN−1 )) this rounding error can increase to O(N 2 );
(3) The Taylor series of Tn (cos(θ + δ θ )) = cos(n(θ + δ θ )) about θ involves simple trigonometric
terms and is more convenient to work with than the ultraspherical polynomials that arise in a
Taylor series of Tn (x + δ x) about x.
The Taylor series expansion of Tn (cos(θ + δ θ )) about θ ∈ [0, π] can be expressed as
∞
cos(n(θ + δ θ )) =
∑ cos(`) (nθ )
`=0
where
∞
(δ θ )`
(nδ θ )`
= ∑ (−1)b(`+1)/2c Φ` (nθ )
,
`!
`!
`=0
(
cos(θ ), ` even,
Φ` (θ ) =
sin(θ ), ` odd.
(2.4)
4 of 13
N. HALE AND A. TOWNSEND
If |δ θ | is small then a good approximation to Tn (cos(θ + δ θ )) can be obtained by truncating this series
after just a handful of terms. The following lemma determines the accuracy we can expect from taking
the first L terms.
L EMMA 2.1 For any L > 1 and n > 0,
L−1
(δ θ )` (n|δ θ |)L
(`)
RL,n,δ θ := max cos(n(θ + δ θ )) − ∑ cos (nθ )
.
6
`! L!
θ ∈[0,π] `=0
Proof. By the mean-value form of the remainder we have
RL,n,δ θ 6 max | cos(L) (nθ̂ )|
θ̂ ∈[0,π]
|δ θ |L
(n|δ θ |)L
6
,
L!
L!
as required.
For 0 6 k 6 N − 1, we write θkleg = θk∗ + δ θk and substitute the first L terms of (2.4) into (2.1) to
obtain,
!
N−1
L−1
`
b(`+1)/2c
∗ (nδ θk )
Φ` (nθk )
(2.5)
fk,L,δ θk = ∑ cn ∑ (−1)
`!
n=0
`=0
!
L−1
` N−1
δ
θ
= ∑ (−1)b(`+1)/2c k ∑ (n` cn )Φ` (nθk∗ ) ,
(2.6)
`!
n=0
`=0
where the second equality follows by interchanging the order of the summations.
C OROLLARY 2.1 If fk is as in (2.1) and fk,L,δ θk as in (2.5), then for any L > 1,
N−1
| fk − fk,L,δ θk | 6
∑ |cn |RL,n,δ θk 6
n=0
(N|δ θk |)L
||c||1 ,
L!
0 6 k 6 N − 1.
Proof. This follows immediately from applying Lemma 2.1 to each of the terms in (2.1).
leg
∗
Hence, if we approximate θ0leg , . . . , θN−1
by points θ0∗ , . . . , θN−1
such that
•
•
max |θ leg − θk∗ | is sufficiently small (see Section 2.2); and,
06k6N−1 k
∗
θ0∗ , . . . , θN−1
is equally spaced,
then the number of terms, L, required to achieve an accuracy of | fk − fk,L,δ θk | 6 ε||c||1 is small. Moreover, the inner sums of (2.6) can be computed via discrete cosine and sine transforms (see Section 2.3),
which can be computed in O(N log N) via an FFT. This gives us the foundations of a fast algorithm.
2.2
Legendre nodes as a perturbation of a Chebyshev-like grid
leg
cheb1
The Chebyshev points of the first kind, x0cheb1 , . . . , xN−1
, are an approximation to x0leg , . . . , xN−1
which
cheb1
cheb1
−1
= cos (xk ), one
satisfy the requirements in the two bullet points above. In particular, letting θk
can readily show that (Olver et al., 2010, (18.16.3))
|θkleg − θkcheb1 | 6
π
,
2(N + 1)
0 6 k 6 N − 1.
5 of 13
A FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM
However, this bound is a little weak. A better bound is given in Lemma 2.2.
Another possibility is to consider the leading order term of the asymptotic expansion of Legendre
polynomials of large degree (Stieltjes, 1890):
r
4 Γ (N + 1) cos (N + 12 )θ − π4
PN (cos θ ) ∼
,
N → ∞.
π Γ (N + 32 )
(2 sin θ )1/2
The zeros of this leading term are
θkcheb∗ =
(k + 34 )π
N + 21
0 6 k 6 N − 1,
,
which are also equally-spaced and provide us with an approximation to Legendre nodes. In fact, multiplying the numerator and denominator by a factor of 2 and comparing to (2.3) we see that these points
correspond precisely to the odd terms (i.e., when the index k is odd) in the (2N + 1)-point Chebyshev
grid of the first kind.
The following lemma shows how closely θkcheb∗ and θkcheb1 approximate θkleg , and Figure 2 demonstrates that the derived bounds are tight.
leg
L EMMA 2.2 Let θ0leg , . . . , θN−1
denote the N roots of PN (cos θ ). Then,
max θkleg − θkcheb∗ 6
06k6N−1
and
max θkleg − θkcheb1 6
06k6N−1
1
1
6
,
3π(2N + 1) 6πN
(2.7)
0.83845
3π 2 + 2
6
.
6π(2N + 1)
N
Proof. We first consider |θkleg − θkcheb∗ | and restrict our attention to 0 6 k 6 bN/2c − 1. From (Szegö,
1936, Thm. 6.3.2) we know that θkcheb∗ < θkleg . Moreover, by (Brass et al., 1996, Thm. 3.1 & eqn. (3.2))
and cot(x) < 1/x, we have
θkleg − θkcheb∗ 6
1
4N + 2
1
cot θkcheb∗ 6
6
,
2(2N + 1)2
2(2N + 1)2 (4k + 3)π
3(2N + 1)π
0 6 k 6 bN/2c − 1.
By symmetry of the θkleg and θkcheb∗ , the same bound holds for k > bN/2c − 1 when we take the modulus.
The bound on |θkleg − θkcheb1 | follows from the triangle inequality. That is, for 0 6 k 6 N − 1, we
have
π(2k − N + 1)
1
leg
leg
cheb cheb cheb
cheb +
.
θk − θk 1 6 θk − θk ∗ + θk 1 − θk ∗ 6
3π(2N + 1)
2N(2N + 1)
Since k 6 N − 1, we obtain
leg
cheb θk − θk 1 6
1
π(N − 1)
3π 2 + 2
+
6
,
3π(2N + 1) 4N(N + 1/2) 6π(2N + 1)
0 6 k 6 N − 1.
6 of 13
N. HALE AND A. TOWNSEND
10
0
||θleg − θcheb1 ||∞
||θleg − θcheb∗ ||∞
10
-1
10
-2
kθ l eg
−θ c
heb
1
k∞
kθ l eg
−θ c
heb
10 -3
60
.83
8
45N
∗
k∞
10 -4
6(
6π
N
−1
) −1
10 -5
10 -6
10 0
10 1
10 2
10 3
10 4
N
FIG. 2. The maximum distance between corresponding entries of θ leg and θ cheb1 (upper solid line) and θ leg and θ cheb∗ (lower
solid line) as a function of N. The dashed lines depict the bounds derived in Lemma 2.2, which appear to be tight.
C OROLLARY 2.2 Choosing θk∗ = θkcheb1 or θk∗ = θkcheb∗ for 0 6 k 6 N − 1 in (2.6), we have
max | fk − f
06k6N−1
cheb
k,L,δ θk 1
|6
(0.83845)L
||c||1
L!
(2.8)
1
||c||1 ,
(6π)L L!
(2.9)
or
max | fk − fk,L,δ θ cheb∗ | 6
06k6N−1
k
respectively.
Proof. Follows immediately from combining Corollary 2.1 and Lemma 2.2.
We note that in (2.8) and (2.9) the terms multiplying kck1 are independent of N, and so the number
of terms required in (2.6) to obtain a given precision is bounded independently of N. Table 1 shows
these terms for 1 6 L 6 18. Double precision is obtained for L & 9 terms when θk∗ = θkcheb∗ and L & 17
terms when θk∗ = θkcheb1 .
L
1
2
3
4
5
6
7
8
9
0.83845L /L! 8.4 × 10−1 3.5 × 10−1 9.8 × 10−2 2.1 × 10−2 3.5 × 10−3 4.8 × 10−4 5.8 × 10−5 6.1 × 10−6 5.6 × 10−7
((6π)L L!)−1 5.3 × 10−2 1.4 × 10−3 2.5 × 10−5 3.3 × 10−7 3.5 × 10−9 3.1 × 10−11 2.4 × 10−13 1.6 × 10−15 9.2 × 10−18
L
10
11
12
13
14
15
16
17
18
0.83845L /L! 4.7 × 10−8 3.6 × 10−9 2.5 × 10−10 1.6 × 10−11 9.7 × 10−13 5.4 × 10−14 2.9 × 10−15 1.4 × 10−16 6.6 × 10−18
((6π)L L!)−1 4.9 × 10−20 2.3 × 10−22 1.0 × 10−24 4.2 × 10−27 1.6 × 10−29 5.7 × 10−32 1.9 × 10−34 5.9 × 10−37 1.7 × 10−39
Table 1. The terms multiplying kck1 in (2.8) and (2.9) for 1 6 L 6 18. The shaded boxes show the smallest value of L for which
these fall below single and double precision.
7 of 13
A FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM
2.3
Computing the discrete cosine and sine transforms
To complete our fast algorithm for the NDCT we require a fast way to compute the sums contained
inside the parenthesis of (2.6). In particular, writing (2.6) in vector form, we have
L−1
(−1)b(`+1)/2c
(−1)b(`+1)/2c
`
[`]
diag(δ
θ
diag(δ θ )` DST(c[`] ),
)
DCT(c
)
+
∑
∑
`!
`!
`=0
`=1
L−1
f k,L =
even
(2.10)
odd
where [c[`] ]n = n` cn , n = 0, . . . , N − 1. When θ ∗ = θ cheb1 the DCT and DST take the form
!
!
1
1
N−1
N−1
n(k
+
n(k
+
)π
)π
2
2
,
,
0 6 k 6 N − 1,
∑ n` cn cos
∑ n` cn sin
N
N
n=0
n=0
which are readily related to the DCT-III and DST-III, respectively. When θ ∗ = θ cheb∗ they become
!
!
N−1
N−1
n((2k + 1) + 12 )π
n((2k + 1) + 12 )π
`
`
, ∑ n cn sin
, 0 6 k 6 N − 1,
∑ n cn cos
2N + 1
2N + 1
n=0
n=0
which are equivalent to the odd terms of a DCT-III and DST-III of length 2N + 1. Hence, both may be
evaluated in O(N log N) operations.
R EMARK 2.1 Although, for an accuracy of double precision, the θ cheb∗ points require around half the
number of terms in the Taylor series compared to θ cheb1 (see Table 1), we see from above that each
term will be roughly twice as expensive to evaluate. Hence, we expect little difference between the two
approaches when there is a 16-digit accuracy goal. However, for an accuracy goal of, say, 10−3 one
should pick θ ∗ = θ cheb∗ .
R EMARK 2.2 The numerical results in this paper are computed in MATLAB. Although MATLAB
uses FFTW for FFT computations (Frigo & Johnson, 2005), it does not allow access to FFTW’s DCT
and DST routines (REDFT01 and RODFT01, respectively.).2 Instead, we use the DCT and DST codes
implemented in Chebfun (Driscoll et al., 2014), which compute the DCT and DST via a complexvalued FFT of double the length. Based on our experiments, we believe this to be a factor of 2-4 slower
than calling REDFT01 and RODFT01 directly.
2.4
Direct approach
To test the fast algorithm above we construct TN (xleg ) via the three-term recurrence relation:
T0 (x) = 1,
T1 (x) = x,
Tn+1 (x) = 2xTn (x) − Tn−1 (x),
n > 1.
Although this approach requires O(N 2 ) operations to compute TN (xleg )c, the memory cost can easily
be reduced to O(N) by computing the matrix-vector product as a recurrence. Hereinafter, we refer to
this as the direct approach.
We note that one could instead compute the
NDCT using the closed form expression of Tn (cos θ ) to
construct TN (xleg ) = cos θ leg [0, 1, . . . , N − 1] and then evaluate the matrix-vector product f = TN (xleg )c.
However, such a closed form is not available when we come to compute PN (xleg ) in the next section.
2 The
MATLAB signal processing toolbox has a dct() function, but this is computed via an FFT of double the length.
8 of 13
N. HALE AND A. TOWNSEND
10 -9
10
-10
Absolute error
10 -11
10 -9
O(n0 )
O(n−0.5 )
O(n−1 )
O(n−1.5 )
10
-10
2.5 )
O (N
O(n0 )
O(n−0.5 )
O(n−1 )
O(n−1.5 )
10 -11
2)
2
2 log N )
O (N /
-12
O (N
10 -13
O (N
10 -13
O (N
10 -14
O(N)
10 -14
O (N
10
10
1.5 )
-12
2
1.5 / log
0.5
N)
log N )
O(log2 N)
10 -15
10 -16
10 2
10 -15
N
10 3
10 -16
10 2
N
10 3
FIG. 3. Errors in computing the NDCT of vectors with various rates of decay using the direct method (left) and the new FFT-based
approach with θ ∗ = θ cheb1 (right). In both cases, a vector of length N is created using randn(N,1) in MATLAB and then scaled
so that the nth entry is O(n0 ), O(n−0.5 ), O(n−1 ), and O(n−1.5 ), giving the four curves in each panel. The dashed lines depict
heuristic observations of the error growth in each case. The log factors may seem somewhat arbitrary, but the lines give a much
better fit to the data when these are included.
2.5
Numerical results for the NDCT
All the numerical results were performed on a 3.40GHz Intel Core i7-2600 PC with MATLAB 2015a
in Ubuntu Linux. The vector of transformed nodes θ leg is computed with the legpts() command in
Chebfun (Driscoll et al., 2014, v5.2) which uses Bogaert’s algorithm (Bogaert, 2014). As discussed in
Remark 2.2, DCTs and DSTs are computed using the chebfun.dct() and chebfun.dst() routines.
MATLAB codes for reproducing all of the results contained within this paper are available online (Hale
& Townsend, 2015).
As a first test we take randomly distributed vectors c with various rates of decay and compare
the accuracy of both the direct and FFT-based approaches against an extended precision computation
(Figure 3).3 The results for θ cheb1 and θ cheb∗ in the FFT-based approach are virtually indistinguishable
so we show only the former. We find the FFT-based approaches are more accurate than the direct
approach, despite our algorithm involving many significant approximations. In many applications the
Chebyshev expansion in (2.1) is derived by an approximation of a smooth function. In this setting,
if the function is Hölder continuous with α > 1/2, we observe that that the FFT-based algorithm has
essentially no error growth with N.
Our second test investigates the time taken to compute a real-valued NDCT of length N. Figure 4
compares the direct approach (squares) and the FFT-based approach with θ cheb1 (triangles) and θ cheb∗
(circles), in the range 100 6 N 6 10,000. The variation of computational times between consecutive
values of N is typical of FFT-based algorithms, where the theoretical cost of the FFT as well as the
precise algorithm employed by FFTW depends on the prime factorization of N. The triangles and
circles lie on a piecewise linear least squares fit to the computational timings, to suggest some kind of
‘average’ time for near-by values of N. The variation also makes it difficult to say when the FFT-based
algorithm becomes more efficient than the direct approach, but it occurs around N = 1,000.
particular, the vector corresponding to, say, N = 50 with O(n−0.5 ) decay can be reproduced exactly by the MATLAB code
rng(0,’twister’); c = rand(50,1)./sqrt(1:50).’;.
3 In
A FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM
(N
2
)
O
10 -1
recurrence
cheb1
cheb∗
og
Nl
O(
Time (secs)
9 of 13
N)
10 -2
10 -3
10 2
10 3
10 4
N
FIG. 4. Time to compute a NDCT of various lengths using the direct approach (squares) and FFT-based approaches from Section 2
with xcheb1 (triangles) and xcheb∗ (circles). For larger values of N the two FFT-based approaches require a comparable time, and
the crossover with the direct approach occurs at around N = 1,000. We believe the jump in the xcheb∗ time (circles) at around
N = 650 is caused by a switch in algorithmic details of the FFT routine.
3. The discrete Legendre transform
Our procedure for computing the DLT in (1.1) splits naturally into two stages. The first deals with the
non-uniform frequency present in Pn (cos θ ) and converts the transform to one involving Tn (cos θ ) with
leg
uniform frequency in θ . The second stage deals with the non-uniform grid cos(θ0leg ), . . . , cos(θN−1
) and
is the NDCT from the previous section.
Denote by M the N × N upper-triangular matrix given by
 Γ (n/2+1/2)
1

,
0 = k 6 n 6 N − 1, n even,

 π Γ (n/2+1)
Γ ((n−k)/2+1/2)
2
Mkn = π Γ ((n+k)/2+1) , 0 < k 6 n 6 N − 1, k + n even,



0,
otherwise.
If c is the vector of Legendre coefficients in (1.1) and ĉ = Mc, then we have (Alpert & Rokhlin, 1991)
fk =
N−1
N−1
n=0
n=0
∑ cn Pn (xkleg ) = ∑ ĉn Tn (xkleg ),
0 6 k 6 N − 1.
(3.1)
Now, the summation on the righthand side takes the form of the NDCT (2.1) and we may use the
algorithm described in Section 2.3 to compute f in O(N log N) operations.
To compute ĉ directly, i.e. via the matrix-vector product Mc, requires O(N 2 ) operations. Instead,
we note that the transform ĉ = Mc can be computed in O(N(log N)2 / loglog N) operations using the
FFT-based algorithm described in (Hale & Townsend, 2014) or by the fast multipole-based approach
in (Alpert & Rokhlin, 1991). In this paper we use the algorithm in (Hale & Townsend, 2014) because it
has no precomputational cost and so allows for adaptive discretizations.
Another convenient property of the transforms in this paper is that there is an asymptotically optimal
selection of algorithmic parameters for any working tolerance. In the NDCT, the number of terms in
the Taylor series expansion can be precisely determined based on the working tolerance. We would like
this to also hold for the O(N(log N)2 / loglog N) Chebyshev–Legendre transform described in (Hale &
10 of 13
N. HALE AND A. TOWNSEND
10 -9
Absolute error
10
-10
10 -9
O(n0 )
O(n−0.5 )
O(n−1 )
O(n−1.5 )
10
N
2.5 / log
10 -11
O (N
10 -12
2 log N
O (N /
10 -13
10 -14
O (N
O(n0 )
O(n−0.5 )
O(n−1 )
O(n−1.5 )
)
)
N)
O (N
10 -12
2 log N
O (N /
10 -14
O(N/ log N)
2.5 / log
10 -11
10 -13
1.5 / log N )
10 -15
10 -16
10 2
-10
O (N
)
1.5 / log N )
O(N/ log N)
10 -15
N
10 3
10 -16
10 2
N
10 3
FIG. 5. Errors in computing the DLT of vectors with various rates of decay using the direct approach (left) and the FFT-based
approach (right). In both cases, a vector of length N is created using randn(N,1) in MATLAB and then scaled so that the
nth entry is O(n0 ), O(n−0.5 ), O(n−1 ), and O(n−1.5 ), giving the four curves in each panel. The dashed lines depict heuristic
observations of the error growth in each case.
Townsend, 2014). We remark that though the paper considered only a working tolerance of machine
precision throughout, the algorithmic parameters were derived and determined in terms of ε > 0. We
slightly modified our implementation to exploit arbitrary working tolerances.
3.1
Numerical results for the discrete Legendre transform
First, we take random vectors c with normally distributed entries and then scale them to have algebraic
rates of decay O(n0 ), O(n−0.5 ), O(n−1 ), and O(n−1.5 ), as in Section 2.5. The accuracy of both the direct
and FFT-based approaches is compared against an extended precision computation. The results are
shown in Figure 5 and one can see that the errors for the direct and FFT-based approach are comparable.
In most practical applications the Legendre coefficients are associated to a smooth function so they
decay at least algebraically and usually the transform has no error growth with N.
While we achieve a similar error to the direct approach, our FFT-based approach has a lower complexity and are more computationally efficient for large N. Figure 6 shows a comparison of the computational times for the direct and FFT-based approach. When comparing with Figure 4 we see that the
computational time for the DLT is dominated by the leg2cheb transformation. For an accuracy goal of
double precision, the crossover between the direct and FFT-based approach is approximately N = 5,000.
4. The inverse discrete Legendre transform
We now turn to the IDLT, which can be seen as taking values of a function on an N-point Legendre grid and returning the corresponding Legendre coefficients of the degree N − 1 polynomial interpolant. The basis of the algorithm we propose is analogous to the forward transform and has the same
O(N(log N)2 / loglog N) complexity.
In this section because of the inverse, it is more convenient to work with matrix notation rather than
leg
sums. In this vein we denote the N × N matrix with ( j, k) entries Pk (xleg
j ) as PN (x ) and the N × N with
11 of 13
A FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM
10 1
recurrence
cheb1
Time (secs)
10 0
2
O
10 -1
(N
)
gN
g lo
/ lo
)
gN
(lo
)
2
N
O(
10 -2
10 3
10 4
10 5
N
FIG. 6. Time taken to compute the DLT of length N using the direct approach (squares) and FFT-based approach (triangles). Here
we choose 1000 logarithmically spaced values of N in the range 103 –105 . The larger squares and rectangles are every 50th one of
these. We omit the results for the FFT-based approach using the points xcheb∗ as they are indistinguishable from those of xcheb1 .
leg
( j, k) entry Tk (xleg
j ) as TN (x ). Using the orthogonality of Legendre polynomials and the exactness of
Gauss–Legendre quadrature, the IDLT (1.2) can be conveniently written in matrix notation as
leg
T leg
c = P−1
N (x ) f = Ds PN (x )Dwleg f ,
(4.1)
−2 T
leg is the vector of Gauss–Legendre quadrature weights.
where s = (kP0 k−2
2 , . . . , kPN−1 k2 ) and w
From (3.1) we know that, for any vector c, PN (xleg )c = TN (xleg )ĉ = TN (xleg )Mc, and hence we have the
matrix decomposition PN (xleg ) = TN (xleg )M. Substituting this into (4.1) we obtain
c = Ds M T TTN (xleg )Dw f .
Thus, c can also be computed in O(N(log N)2 / loglog N) operations because:
1. Dw and Ds are diagonal matrices, so can be readily be applied to a vector in O(N) operations,
2. TTN (xleg ) is precisely the transpose of the NDCT in Section 2. Since the transpose of the DCTs
and DSTs in Section 2.3 can themselves be expressed in terms of DCTs and DSTs (of type-II),
TTN (xleg ) can be approximated in O(N log N) operations by taking the transpose of (2.10).
3. The matrix-vector product with M T is closely related to the Chebyshev–Legendre transform and
can be computed in O(N(log N)2 / loglog N) operations by a slight modification of the algorithm
in (Hale & Townsend, 2014). (Alternatively, one could use the ideas in (Alpert & Rokhlin, 1991).)
4.1
Numerical results for the inverse discrete Legendre transform
For the inverse transform, the direct approach requires O(N 2 ) operations to compute the IDLT of length
N, rather than the naive O(N 3 ) standard inversion algorithm. This is because the direct approach
we have implemented exploits the discrete orthogonality relation that holds for Legendre polynomials (see (4.1)). The matrix-vector product with PTN (xleg ) in (4.1) is computed by using the three-term
recurrence relations satisfied by Legendre polynomials, except now along rows instead of columns.
12 of 13
N. HALE AND A. TOWNSEND
10 1
recurrence
cheb1
recurrence
cheb1
10 -12
)
log N
O (N
10 -13
Time (secs)
Absolute error
10 0
2
O
(N
10 -1
)
g lo
/ lo
N)
gN
)
2
log
N(
O(
10 -2
10
-14
10 2
10 3
10 3
10 4
10 5
N
N
FIG. 7. Left: Errors in computing the IDLT of vectors using the direct method and the FFT-based approach. A vector of length N
is created using randn(N,1) in MATLAB with the dashed line showing the observed growth of the maximum absolute error.
Right: Time taken to compute the IDLT of length N using the direct approach (squares) and FFT-based approach (triangles).
For our first numerical experiment we take randomly generated vectors with normally distributed
entries and compare the accuracy of both the direct and FFT-based approach against an extended precision computation (see Figure 7, left). Unlike in Section 2.5 and Section 3.1 we do not consider decay
in these vectors, as they will typically correspond to function values in space where there is no reason
to expect decay. We find that the errors incurred by our FFT-based IDLT are consistent with the direct
approach.
In Figure 7 (right) we show the execution time for computing the IDLT with the direct approach
(squares) and FFT-based approach (triangles). As with the DLT, our IDLT algorithm is more computationally efficient than the direct approach when N > 5,000.
Conclusion
We have presented fast and simple algorithms for the discrete Legendre transform and its inverse, which
rely on a nonuniform discrete cosine transform and the Chebyshev–Legendre transform. Both components are based on the fast Fourier transform and have no precomputational cost. For an N-point
transform both algorithms have a complexity of O(N(log N)2 / log log N) operations and are faster than
the direct approach when N > 5,000. MATLAB code to compute the transformations (and all the results
contained in this paper) are available online (Hale & Townsend, 2015). The codes are also accessible in
Chebfun (Driscoll et al., 2014) via chebfun.dlt() and chebfun.idlt(), respectively.
Acknowledgments
We thank André Weideman and Anthony Austin for reading a draft version of this manuscript.
R EFERENCES
A LPERT, B. K. & ROKHLIN , V. (1991) A fast algorithm for the evaluation of Legendre expansions. SIAM J. Sci.
Statist. Comput., 12, 158–179.
B OGAERT, I. (2014) Iteration-free computation of Gauss–Legendre quadrature nodes and weights. SIAM J. Sci.
Comput., 36, A1008–A1026.
A FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM
13 of 13
B RASS , H., F ISCHER , J.-W. & P ETRAS , K. (1996) The Gaussian quadrature method. Abh. Braunschweig. Wiss.
Ges., 47, 115–150 (1997).
DLMF (2014) http://dlmf.nist.gov/ (Release 1.0.9). Online companion to Olver et al. (2010).
D RISCOLL , T. A., H ALE , N. & T REFETHEN , L. N. (2014) Chebfun Guide. Pafnuty Publications, Oxford, UK.
F RIGO , M. & J OHNSON , S. G. (2005) The design and implementation of FFTW3. Proceedings of the IEEE, 93,
216–231. Special issue on “Program Generation, Optimization, and Platform Adaptation”.
G ENTLEMAN , W. M. (1972) Implementing Clenshaw–Curtis quadrature II: Computing the cosine transformation.
Communications of the ACM, 15, 343–346.
H ALE , N. & T OWNSEND , A. (2014) A fast, simple, and stable Chebyshev–Legendre transform using an asymptotic
formula. SIAM J. Sci. Comput., 36, A148–A167.
H ALE , N. & T OWNSEND , A. (2015) http://www.github.com/nickhale/dlt/ (accessed 23 April
2015). MATLAB code to reproduce the results in this paper.
I SERLES , A. (2011) A fast and simple algorithm for the computation of Legendre coefficients. Numer. Math., 117,
529–553.
M ORI , A., S UDA , R. & S UGIHARA , M. (1999) An improvement on Orszag’s fast algorithm for Legendre polynomial transform. Trans. Inform. Process. Soc. Japan, 40, 3612–3615.
O LVER , F. W. J., L OZIER , D. W., B OISVERT, R. F. & C LARK , C. W. (eds) (2010) NIST Handbook of Mathematical Functions. New York, NY: Cambridge University Press. Print companion to DLMF (2014).
O’N EIL , M., W OOLFE , F. & ROKHLIN , V. (2010) An algorithm for the rapid evaluation of special function
transforms. Applied and Computational Harmonic Analysis, 28, 203–226.
P OTTS , D., S TEIDL , G. & TASCHE , M. (1998) Fast algorithms for discrete polynomial transforms. Math. Comp.,
67, 1577–1590.
P OTTS , D. (2003) Fast algorithms for discrete polynomial transforms on arbitrary grids. Linear Algebra Appl.,
366, 353–370. Special issue on structured matrices: analysis, algorithms and applications (Cortona, 2000).
S TIELTJES , T.-J. (1890) Sur les polynômes de Legendre. Ann. Fac. Sci. Toulouse Sci. Math. Sci. Phys., 4,
G1–G17.
S WARZTRAUBER , P. N. (2003) On computing the points and weights for Gauss–Legendre quadrature. SIAM J.
Sci. Comput., 24, 945–954.
S ZEG Ö , G. (1936) Inequalities for the zeros of Legendre polynomials and related functions. Trans. Amer. Math.
Soc., 39, 1–17.
T OWNSEND , A. (2015) A fast analysis-based discrete Hankel transform using asymptotic expansions. (Submitted).
T YGERT, M. (2010) Recurrence relations and fast algorithms. Appl. Comput. Harmon. Anal., 28, 121–128.