intMult08
intMult08
April 7, 2008
Abstract
O(log∗ N )
We give an O(N ·log N ·2 ) algorithm for multiplying two N -bit integers that improves
the O(N · log N · log log N ) algorithm by Schönhage-Strassen [SS71]. Both these algorithms use
∗
modular arithmetic. Recently, Fürer [Für07] gave an O(N · log N · 2O(log N ) ) algorithm which
however uses arithmetic over complex numbers as opposed to modular arithmetic. In this paper,
we use multivariate polynomial multiplication along with ideas from Fürer’s algorithm to achieve
this improvement in the modular setting. Our algorithm can also be viewed as a p-adic version
of Fürer’s algorithm. Thus, we show that the two seemingly different approaches to integer
multiplication, modular and complex arithmetic, are similar.
1 Introduction
Computing the product of two N -bit integers is an important problem in algorithmic number theory
and algebra. A naive approach leads to an algorithm that uses O(N 2 ) bit operations. Karatsuba and
Ofman [KO63] showed that some multiplication operations of such an algorithm can be replaced by
less costly addition operations which reduces the overall running time of the algorithm to O(N log2 3 )
bit operations. Shortly afterwards this result was improved by Toom [Too63] who showed that for
any ε > 0, integer multiplication can be done in O(N 1+ε ) time. This led to the question as to
∗
Research supported through Research I Foundation project NRNM/CS/20030163
†
Research done while visiting IIT Kanpur under Project FLW/DST/CS/20060225
1
whether the time complexity can be improved further by replacing the term O(N ) by a poly-
logarithmic factor. In a major breakthrough, Schönhage and Strassen [SS71] gave two efficient
algorithms for multiplying integers using fast polynomial multiplication. One of the algorithms
∗
achieved a running time of O(N · log N · log log N . . . 2O(log N ) ) using arithmetic over complex
numbers (approximated to suitable precision), while the other used arithmetic modulo carefully
chosen integers to improve the complexity further to O(N · log N · log log N ). Despite many efforts,
the modular algorithm remained the best until a recent remarkable result by Fürer [Für07]. Fürer
∗
gave an algorithm that uses arithmetic over complex numbers and runs in O(N · log N · 2O(log N ) )
time. Till date this is the best time complexity result known for integer multiplication.
Schönhage and Strassen introduced two seemingly different approaches to polynomial multipli-
cation – using complex and modular arithmetic. Fürer’s algorithm improves the time complexity
in the complex arithmetic setting by cleverly reducing some costly multiplications to simple shift.
However, the algorithm needs to approximate the complex numbers to certain precisions during
computation. This introduces the added task of bounding the total truncation errors in the analysis
of the algorithm. On the contrary, in the modular setting the error analysis is virtually absent.
In addition, modular arithmetic gives a discrete approach to a discrete problem like integer mul-
tiplication. Therefore it is natural to ask whether we can achieve a similar improvement in time
complexity of this problem in the modular arithmetic setting. In this paper, we answer this ques-
∗
tion affirmatively. We give an O(N · log N · 2O(log N ) ) algorithm for integer multiplication using
modular arithmetic, thus matching the improvement made by Fürer.
2
The use of inner and outer DFT plays a central role in both Fürer’s as well as our algo-
rithm. Towards understanding the notion of inner and outer DFT in the context of multivariate
polynomials, we present a group theoretic interpretation of Discrete Fourier Transform (DFT).
Arguing along the line of Fürer [Für07] we show that repeated use of efficient computation of inner
DFT’s using some special roots of unity in R makes the overall process efficient and leads to an
∗
O(N · log N · 2O(log N ) ) time algorithm.
Given integers a and b, each of N bits, we encode them as polynomials a(X) and b(X) and
s−1
compute the product polynomial. The product a · b can be recovered by substituting Xs = q M ,
for 1 ≤ s ≤ k, and α = 2u in the polynomial a(X) · b(X). The coefficients in the product
polynomial could be as large as M k · m · 22u and hence it is sufficient to do arithmetic modulo
pc where pc > M k · m · 22u . Therefore, a(X) can indeed be considered as a polynomial over
R = Z[α]/(pc , αm +1). Our choice of the prime p ensures that c is in fact a constant (see Section 5).
3
Definition 2.1. An n-th root of unity ζ ∈ R is said to be primitive if it generates a cyclic group
of order n under multiplication. P Furthermore, it is said to be principal if n is coprime to the
characteristic of R and ζ satisfies n−1 ij
i=0 ζ = 0 for all 0 < j < n.
In Z/pc Z, a 2M -th root of unity is principal if and only if 2M | p − 1 (see also Section 6). As a
result, we need to choose the prime p from the arithmetic progression {1 + i · 2M }i>0 , which is the
main bottleneck of our approach. We now explain how this bottleneck can be circumvented.
An upper bound for the least prime in an arithmetic progression is given by the following
theorem [Lin44]:
Theorem 2.2 (Linnik). There exist absolute constants ` and L such that for any pair of coprime
integers d and n, the least prime p such that p ≡ d mod n is less than `nL .
Heath-Brown showed that the Linnik constant L ≤ 5.5. Recall that M is chosen such
[HB92]
k N
that M is Θ log2 N . If we choose k = 1, that is if we use univariate polynomials to encode
integers, then the parameter M = Θ logN2 N . Hence the least prime p ≡ 1 (mod 2M ) could be as
large as N L . Since all known deterministic sieving procedures take at least N L time this is clearly
infeasible (for a randomized approach see Section 5.1). However, by choosing a larger k we can
ensure that the least prime p ≡ 1 (mod 2M ) is O(N ε ) for some constant ε < 1.
L
L
Remark 2.3. If k is any integer greater than L + 1, then M = O N L+1 and hence the least
prime p ≡ 1 mod 2M can be found in o(N ) time.
Lemma 2.4. Let ζs be a primitive (p − 1)-th root of unity in Z/ps Z. Then there exists a unique
primitive (p − 1)-th root of unity ζs+1 in Z/ps+1 Z such that ζs+1 ≡ ζs (mod ps ). This unique root
is given by ζs+1 = ζs − ff0(ζs)
(ζs ) where f (X) = X
p−1 − 1.
(b) The polynomial xm + 1 = m 2i−1 ) in Z/pc Z. Moreover, for any 0 ≤ i < j ≤ 2m, the
Q
i=1 (x − σ
ideals generated by (x − σ i ) and (x − σ j ) are comaximal in Z[x]/pc Z.
4
(c) The roots σ 2i−1 1≤i≤m are distinct modulo p and therefore the difference of any two of them
is a unit in Z[x]/pc Z.
We then, through interpolation, solve for a polynomial ρ(α) such that ρ(σ 2i+1 ) = ω 2i+1 for all
1 ≤ i ≤ m. Then,
The first two parts of the claim justify the Chinese Remaindering. Finally, computing a polynomial
ρ(α) such that ρ(σ 2i+1 ) = ω 2i+1 can be done through interpolation.
m Q 2j+1 )
j6=i (α − σ
X
ρ(α) = ω 2i+1 Q 2i+1 − σ 2j+1 )
i=1 j6=i (σ
The division by (σ 2i+1 − σ 2j+1 ) is justified as it is a unit in Z/pc Z (part (c) of Claim 2.5).
2. Encode the integers a and b as k-variate polynomials a(X) and b(X) respectively over the
ring R = Z[α]/(pc , αm + 1) (Section 2.1).
4. Use ρ(α) as the principal 2M -th root of unity to compute the Fourier transforms of the k-
variate polynomials a(X) and b(X). Multiply component-wise and take the inverse Fourier
transform to obtain the product polynomial.
5. Evaluate the product polynomial at appropriate powers of two to recover the integer product
and return it (Section 2.1).
The only missing piece is the Fourier transforms for multivariate polynomials. The following
section gives a group theoretic description of FFT.
4 Fourier Transform
A convenient way to study polynomial multiplication is to interpret it as multiplication in a group
algebra.
5
Definition 4.1 (Group PAlgebra). Let G be a group. The group algebra of G over a ring R is
the set of formal sums g∈G αg g where αg ∈ R with addition defined point-wise and multiplication
defined via convolution as follows
! !
X X X X
αg g βh h = αg βh u
g h u gh=u
Multiplying univariate polynomials over R of degree less than n can be seen as multiplication
in the group algebra R[G] where G is the cyclic group of order 2n. Similarly, multiplying k-variate
polynomials of degree less than n in each variable can be seen as multiplying in the group algebra
R[Gk ], where Gk denotes the k-fold product group G × . . . × G.
In this section, we study the Fourier transform over the group algebra R[E] where E is an
additive abelian group. Most of this, albeit in a different form, is well known but is provided here
for completeness. [Sha99, Chapter 17]
In order to simplify our presentation, we will fix the base ring to be C, the field of complex
numbers. Let n be the exponent of E, that is the maximum order of any element in E. A similar
approach can be followed for any other base ring as long as it has a principal n-th root of unity.
We consider C[E] as a vector space with basis {x}x∈E and use the Dirac notation to represent
elements of C[E] — the vector |xi, x in E, denotes the element 1.x of C[E].
An example of a character of E is the trivial character, which we will denote by 1, that assigns
to every element of E the complex number 1. If χ1 and χ2 are two characters of E then their
product χ1 .χ2 is defined as χ1 .χ2 (x) = χ1 (x)χ2 (x).
Proposition 4.3. [Sha99, Chapter 17, Theorem 1] Let E be an additive abelian group of exponent
n. Then the values taken by any character of E are n-th roots of unity. Furthermore, the characters
form a multiplicative abelian group Ê which is isomorphic to E.
An important property that the characters satisfy is the following [Isa94, Corollary 2.14].
Definition 4.5 (Fourier Transform). Let E be an additive abelian group and let x 7→ χx be an
isomorphism between E and Ê. The Fourier transform over E is the linear map from C[E] to C[E]
that sends |xi to |χx i.
6
Thus, the Fourier transform is a change of basis from the point basis {|xi}x∈E to the Fourier
basis {|χx i}x∈E . The Fourier transform is unique only up to the choice of the isomorphism x 7→ χx .
This isomorphism is determined by the choice of the principal root of unity.
Remark 4.6. Given an element |f i ∈ C[E], to compute its Fourier transform it is sufficient to
compute the Fourier coefficients {hχ|f i}χ∈Ê .
Proposition 4.7. 1. Every character λ of B can be “lifted” to a character of E (which will also
be denoted by λ) defined as follows λ(x) = λ(x + A).
2. Let χ1 and χ2 be two characters of E that when restricted to A are identical. Then χ1 = χ2 λ
for some character λ of B.
3. The group B̂ is (isomorphic to) a subgroup of Ê with the quotient group Ê/B̂ being (isomor-
phic to) Â.
P
We now consider the task of computing the Fourier transform of an element |f i = fx |xi
presented as a list of coefficients {fx } in the point basis. For this, it is sufficient to compute the
Fourier coefficients {hχ|f i} for each character χ of E (Remark 4.6). To describe the Fast Fourier
transform we fix two sets of cosets representatives, one of A in E and one of B̂ in Ê as follows.
2. For each character ϕ of A, fix a character χϕ of E such that χϕ restricted to A is the character
ϕ. The characters {χϕ } form (can be thought of as) a set of coset representatives of B̂ in Ê.
P {xb }b∈B forms a set of coset representatives, any |f i ∈ C[E] can be written uniquely as
Since
|f i = fb,a |xb + ai.
P
Proposition 4.8. Let |f i = fb,a |xb + ai be an element of C[E]. For each b ∈ B and ϕ ∈ Â let
|fb i ∈ C[A] and |fϕ i ∈ C[B] be defined as follows.
X
|fb i = fb,a |ai
a∈A
X
|fϕ i = χϕ (xb )hϕ|fb i |bi
b∈B
Then for any character χ of E, which can be expressed as χ = λ · χϕ , the Fourier coefficient
hχ|f i = hλ|fϕ i.
7
Proof. Recall that λ(x + A) = λ(x), and ϕ is a restriction of the χ to the subgroup A.
XX
hχ|f i = χ(xb + a)fb,a
b a
X X
= λ(xb + a) χϕ (xb + a)fb,a
b a
X X
= λ(b)χϕ (xb ) ϕ(a)fb,a
b a
X
= λ(b)χϕ (xb )hϕ|fb i
b
= hλ|fϕ i
P
We are now ready to describe the Fast Fourier transform given an element |f i = fx |xi.
1. For each b ∈ B compute the Fourier transforms of |fb i. This requires #B many Fourier
transforms over A.
2. As a result of the previous step we have for each b ∈ P B and ϕ ∈ Â the Fourier coeffi-
cients hϕ|fb i. Compute for each ϕ the vectors |fϕ i = b∈B χϕ (xb )hϕ|fb i |bi. This requires
#Â.#B = #E many multiplications by roots of unity.
3. For each ϕ ∈ Â compute the Fourier transform of |fϕ i. This requires #Â = #A many Fourier
transforms over B.
4. Any character χ of E is of the form χϕ λ for some ϕ ∈ Â and λ ∈ B̂. Using Proposition 4.8
we have at the end of Step 3 all the Fourier coefficients hχ|f i = hλ|fϕ i.
If the quotient group B itself has a subgroup that is isomorphic to A then we can apply this
process recursively on B to obtain a divide and conquer procedure to compute the Fourier transform.
In the standard FFT we use E = Z/2n Z. The subgroup A is 2n−1 E which is isomorphic to Z/2Z
and the quotient group B is Z/2n−1 Z.
8
where MR denotes the complexity of multiplications in R. The first term comes from the #B many
Fourier transforms over A (Step 1 of FFT), the second term corresponds to the multiplications by
roots of unity (Step 2) and the last term comes from the #A many Fourier transforms over B
(Step 3).
Since A is a subgroup of B as well, Fourier transforms over B can be recursively computed in a
similar way, with B playing the role of E. Therefore, by simplifying the recurrence in Equation 1
we get: k
M k log M
M log M
F(2M, k) = O F(2m, k) + MR (2)
mk log m log m
Proof. The FFT over a group of size n is usually done by taking 2-point FFT’s followed by n2 -point
FFT’s. This involves O(n log n) multiplications by roots of unity and additions in base ring. Using
this method, Fourier transforms over A can be computed with O(mk log m) multiplications and
additions in R. Since each multiplication is between an element of R and a power of α, this can
be efficiently achieved through shifting operations. This is dominated by the addition operation,
which takes O(m log p) time, since this involves adding m coefficients from Z/pc Z.
M k log M
k
F(2M, k) = O M log M · m · log p + MR (3)
log m
5 Complexity Analysis
The choice of parameters should ensure that the following constraints are satisfied:
1. M k = Θ logN2 N and m = O(log N ).
2. M L = O(N ε ) where L is the Linnik constant (Theorem 2.2) and ε is any constant less than
1. Recall that this makes picking the prime by brute force feasible (see Remark 2.3).
2N
3. pc > M k · m · 22u where u = M km
. This is to prevent overflows during modular arithmetic
(see Section 2.1).
It is straightforward to check that k > L + 1 and c > 5(k + 1) satisfy the above constraints. Heath-
Brown [HB92] showed that L ≤ 5.5 and therefore c = 42 clearly suffices.
Let T (N ) denote the time complexity of multiplying two N bit integers. This consists of:
9
(e) Evaluating the product polynomial.
As argued before, the prime p can be chosen in o(N ) time. To compute ρ(α), we need to lift
a generator of F∗p to Z/pc Z followed by an interpolation. Since c is a constant and p is a prime of
O(log N ) bits, the time required for Hensel Lifting and interpolation is o(N ).
The encoding involves dividing bits into smaller blocks, and expressing the exponents of q in
base M (Section 2.1) and all these take O(N ) time since M is a power of 2. Similarly, evaluation
of the product polynomial takes linear time as well. Therefore, the time complexity is dominated
by the time taken for polynomial multiplication.
M k log M
k
F(2M, k) = O M log M · m · log p + MR
log m
Proposition 5.1. [Sch82] Multiplication in the ring R reduces to multiplying O(log2 N ) bit integers
2
and therefore MR = T O(log N ) .
Proof. Elements of R can be seen as polynomials in α over Z/pc Z with degree at most m. Given
two such polynomials f (α) and g(α), encode them as follows: Replace α by 2d , transforming the
polynomials f (α) and g(α) to the integers f (2d ) and g(2d ) respectively. The parameter d is chosen
such that the coefficients of the product h(α) = f (α)g(α) can be recovered from the product
f (2d ) · g(2d ). For this, it is sufficient to ensure that the maximum coefficient of h(α) is less than
2d . Since f and g are polynomials of degree m, we would want 2d to be greater than m · p2c , which
can be ensured by choosing d = Θ (log N ). The integers f (2d ) and g(2d ) are bounded by 2md and
hence the task of multiplying in R reduces to O(log2 N ) bit integer multiplication.
T (N ) = O(F(M, k))
M k log M
k
= O M log M · m · log p + MR
log m
N
= O N log N + T (O(log2 N ))
log N · log log N
time.
10
5.1 Choosing the Prime Randomly
To ensure that the search for a prime p ≡ 1 (mod 2M ) does not affect the overall time complexity
of the algorithm, we considered multivariate polynomials to restrict the value of M ; an alternative
is to use randomization.
Proposition 5.3. Assuming ERH, a prime p such that p ≡ 1 (mod 2M ) can be computed by a
randomized algorithm with expected running time Õ(log3 M ).
Proof. Titchmarsh [Tit30] (see also Tianxin [Tia90]) showed, assuming ERH, that the number of
primes less than x in the arithmetic progression {1 + i · 2M }i>0 is given by,
Li(x) √
π(x, 2M ) = + O( x log x)
ϕ(2M )
√
for 2M ≤ x·(log x)−2 , where Li(x) = Θ( logx x ) and ϕ is the Euler totient function. In our case, since
M is a power of two, ϕ(2M ) = M , and hence for x ≥ 4M 2 ·log6 M , we have π(x, 2M ) = Ω M log x
x .
Therefore, for an i chosen uniformly randomly in the range 1 ≤ i ≤ 2M ·log6 M , the probability that
i · 2M + 1 is a prime is at least logd x for a constant d. Furthermore, primality test of an O(log M ) bit
number can be done in Õ(log2 M ) time using Rabin-Miller primality test [Mil76, Rab80]. Hence,
with x = 4M 2 · log6 M a suitable prime for our algorithm can be found in expected Õ(log3 M )
time.
6 A Different Perspective
Our algorithm can be seen as a p-adic version of Fürer’s integer multiplication algorithm, where the
field C is replaced by Qp , the field of p-adic numbers (for a quick introduction, see Baker’s online
notes [Bak07]). Much like C, where representing a general element (say in base 2) takes infinitely
many bits, representing an element in Qp takes infinitely many p-adic digits. Since we cannot work
with infinitely many digits, all arithmetic has to be done with finite precision. Modular arithmetic
in the base ring Z[α]/(pc , αm + 1), can be viewed as arithmetic in the ring Qp [α]/(αm + 1) keeping
a precision of ε = p−c .
Arithmetic with finite precision naturally introduces some errors in computation. However, the
nature of Qp makes the error analysis simpler. The field Qp comes with a norm | · |p called the p-adic
norm, which satisfies the stronger triangle inequality |x + y|p ≤ max |x|p , |y|p [Bak07, Proposi-
tion 2.6]. As a result, unlike in C, the errors in computation do not compound.
Recall that the efficiency of FFT crucially depends on a special principal 2M -th root of unity in
Qp [α]/(αm + 1). Such a root is constructed with the help of a primitive 2M -th root of unity in Qp .
The field Qp has a primitive 2M -th root of unity if and only if 2M divides p − 1 [Bak07, Theorem
5.12]. Also, if 2M divides p − 1, a 2M -th root can be obtained from a (p − 1)-th root of unity by
taking a suitable power. A primitive (p − 1)-th root of unity in Qp can be constructed, to sufficient
precision, using Hensel Lifting starting from a generator of F∗p .
11
7 Conclusion
There are two approaches for multiplying integers, one using arithmetic over complex numbers, and
the other using modular arithmetic. Using complex numbers, Schönhage and Strassen [SS71] gave
∗
an O(N · log N · log log N . . . 2O(log N ) ) algorithm. Fürer [Für07] improved this complexity to O(N ·
∗
log N ·2O(log N ) ) using some special roots of unity. The other approach, that is modular arithmetic,
can be seen as arithmetic in Qp with certain precision. A direct adaptation of the Schönhage-
∗
Strassen’s algorithm in the modular setting leads to an O(N ·log N ·log log N . . . 2O(log N ) ) algorithm.
In this paper, we showed that by choosing an appropriate prime and a special root of unity, a running
∗
time of O(N · log N · 2O(log N ) ) can be achieved through modular arithmetic as well. Therefore, in
a way, we have unified the two paradigms.
Acknowledgement
We thank V. Vinay, Srikanth Srinivasan and the anonymous referees for many helpful suggestions
that improved the overall presentation of this paper.
References
[Bak07] Alan J. Baker. An introduction to p-adic numbers and p-adic analysis. Online Notes,
2007. http://www.maths.gla.ac.uk/∼ajb/dvi-ps/padicnotes.pdf.
[Für07] Martin Fürer. Faster integer multiplication. Proceedings of the 39th ACM Symposium on
Theory of Computing, pages 57–66, 2007.
[HB92] D. R. Heath-Brown. Zero-free regions for Dirichlet L-functions, and the least prime in an
arithmetic progression. In Proceedings of the London Mathematical Society, 64(3), pages
265–338, 1992.
[Isa94] I. Martin Isaacs. Character theory of finite groups. Dover publications Inc., New York,
1994.
[Lin44] Yuri V. Linnik. On the least prime in an arithmetic progression, I. The basic theorem,
II. The Deuring-Heilbronn’s phenomenon. Rec. Math. (Mat. Sbornik), 15:139–178 and
347–368, 1944.
[Mil76] G. L. Miller. Riemann’s hypothesis and tests for primality. Journal of Computer and
System Sciences, 13:300–317, 1976.
[NZM91] Ivan Niven, Herbert S. Zuckerman, and Hugh L. Montgomery. An Introduction to the
Theory of Numbers. John Wiley and Sons, Singapore, 1991.
[Rab80] Michael O. Rabin. Probabilistic algorithm for testing primality. Journal of Number
Theory, 12:128–138, 1980.
12
[Sch82] Arnold Schönhage. Asymptotically fast algorithms for the numerical multiplication and
division of polynomials with complex coefficients. In Computer Algebra, EUROCAM,
volume 144 of Lecture Notes in Computer Science, pages 3–15, 1982.
[Sha99] Igor R. Shafarevich. Basic Notions of Algebra. Springer Verlag, USA, 1999.
[SS71] A Schönhage and V Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–
292, 1971.
[Tia90] Cai Tianxin. Primes representable by polynomials and the lower bound of the least primes
in arithmetic progressions. Acta Mathematica Sinica, New Series, 6:289–296, 1990.
[Tit30] E. C. Titchmarsh. A divisor problem. Rend. Circ. Mat. Palerme, 54:414–429, 1930.
[Too63] A L. Toom. The complexity of a scheme of functional elements simulating the multipli-
cation of integers. English Translation in Soviet Mathematics, 3:714–716, 1963.
13