0% found this document useful (0 votes)
7 views

intMult08

The document presents an O(N · log N · 2O(log N)) algorithm for multiplying two N-bit integers using modular arithmetic, improving upon previous algorithms by Schönhage-Strassen and Fürer. The authors utilize multivariate polynomial multiplication and Discrete Fourier Transforms (DFT) to achieve this efficiency, demonstrating the similarities between modular and complex arithmetic approaches. Key innovations include the choice of a suitable ring for polynomial multiplication and the efficient encoding of integers into multivariate polynomials.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

intMult08

The document presents an O(N · log N · 2O(log N)) algorithm for multiplying two N-bit integers using modular arithmetic, improving upon previous algorithms by Schönhage-Strassen and Fürer. The authors utilize multivariate polynomial multiplication and Discrete Fourier Transforms (DFT) to achieve this efficiency, demonstrating the similarities between modular and complex arithmetic approaches. Key innovations include the choice of a suitable ring for polynomial multiplication and the efficient encoding of integers into multivariate polynomials.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Fast Integer Multiplication Using Modular Arithmetic

Anindya De, Piyush P Kurur∗, Chandan Saha


Dept. of Computer Science and Engineering
Indian Institute of Technology, Kanpur
Kanpur, UP, India, 208016
{anindya,ppk,csaha}@cse.iitk.ac.in
Ramprasad Saptharishi†
Chennai Mathematical Institute
Plot H1, SIPCOT IT Park
Padur PO, Siruseri, India, 603103
and
Dept. of Computer Science and Engineering
Indian Institute of Technology, Kanpur
Kanpur, UP, India, 208016
[email protected]

April 7, 2008

Abstract
O(log∗ N )
We give an O(N ·log N ·2 ) algorithm for multiplying two N -bit integers that improves
the O(N · log N · log log N ) algorithm by Schönhage-Strassen [SS71]. Both these algorithms use

modular arithmetic. Recently, Fürer [Für07] gave an O(N · log N · 2O(log N ) ) algorithm which
however uses arithmetic over complex numbers as opposed to modular arithmetic. In this paper,
we use multivariate polynomial multiplication along with ideas from Fürer’s algorithm to achieve
this improvement in the modular setting. Our algorithm can also be viewed as a p-adic version
of Fürer’s algorithm. Thus, we show that the two seemingly different approaches to integer
multiplication, modular and complex arithmetic, are similar.

1 Introduction
Computing the product of two N -bit integers is an important problem in algorithmic number theory
and algebra. A naive approach leads to an algorithm that uses O(N 2 ) bit operations. Karatsuba and
Ofman [KO63] showed that some multiplication operations of such an algorithm can be replaced by
less costly addition operations which reduces the overall running time of the algorithm to O(N log2 3 )
bit operations. Shortly afterwards this result was improved by Toom [Too63] who showed that for
any ε > 0, integer multiplication can be done in O(N 1+ε ) time. This led to the question as to

Research supported through Research I Foundation project NRNM/CS/20030163

Research done while visiting IIT Kanpur under Project FLW/DST/CS/20060225

1
whether the time complexity can be improved further by replacing the term O(N  ) by a poly-
logarithmic factor. In a major breakthrough, Schönhage and Strassen [SS71] gave two efficient
algorithms for multiplying integers using fast polynomial multiplication. One of the algorithms

achieved a running time of O(N · log N · log log N . . . 2O(log N ) ) using arithmetic over complex
numbers (approximated to suitable precision), while the other used arithmetic modulo carefully
chosen integers to improve the complexity further to O(N · log N · log log N ). Despite many efforts,
the modular algorithm remained the best until a recent remarkable result by Fürer [Für07]. Fürer

gave an algorithm that uses arithmetic over complex numbers and runs in O(N · log N · 2O(log N ) )
time. Till date this is the best time complexity result known for integer multiplication.
Schönhage and Strassen introduced two seemingly different approaches to polynomial multipli-
cation – using complex and modular arithmetic. Fürer’s algorithm improves the time complexity
in the complex arithmetic setting by cleverly reducing some costly multiplications to simple shift.
However, the algorithm needs to approximate the complex numbers to certain precisions during
computation. This introduces the added task of bounding the total truncation errors in the analysis
of the algorithm. On the contrary, in the modular setting the error analysis is virtually absent.
In addition, modular arithmetic gives a discrete approach to a discrete problem like integer mul-
tiplication. Therefore it is natural to ask whether we can achieve a similar improvement in time
complexity of this problem in the modular arithmetic setting. In this paper, we answer this ques-

tion affirmatively. We give an O(N · log N · 2O(log N ) ) algorithm for integer multiplication using
modular arithmetic, thus matching the improvement made by Fürer.

Overview of our result


As is the case in both Schönhage-Strassen’s and Fürer’s algorithms, we start by reducing the prob-
lem to polynomial multiplication over a ring R by properly encoding the given integers. Polynomials
can be multiplied efficiently using Discrete Fourier Transforms (DFT), which uses special roots of
unity. For instance, to multiply two polynomials of degree less than M using the Fourier transform,
we require a principal 2M -th root of unity (see Definition 2.1 for principal root). An efficient way
of computing the DFT of a polynomial is through the Fast Fourier Transform (FFT). In addition,
if multiplications by these roots are efficient, we get a faster algorithm. Since multiplication by 2
is a shift, it would be good to have a ring with 2 as a root of unity. One way to construct such
a ring in the modular setting is to consider rings of the form R = Z/(2M + 1)Z as is the case in
Schönhage and Strassen [SS71]. However, this makes the size of R equal to 2M , which although
works in case of Schönhage and Strassen’s algorithm, is a little too large to handle in our case. We
would like to find a ring whose size is bounded by some polynomial in M and which also contains
a principal 2M -th root of unity. In fact, it is this choice of ring that poses the primary challenge in
adapting Fürer’s algorithm and making it work in the discrete setting. In order to overcome this
hurdle we choose the ring to be R = Z/pc Z, for a prime p and a constant c such that pc = poly(M ).
The ring Z/pc Z, has a principal 2M -th root of unity if and only if 2M divides p − 1, which means
that we need to search for a prime p in the arithmetic progression {1 + i · 2M }i>0 . To make this
search computationally efficient, we need the degree of the polynomials M to be sufficiently small
compared to the input size. It turns out that this can be achieved by considering multivariate poly-
nomials instead of univariate polynomials. We use enough variables to make sure that the search
for such a prime does not affect the overall running time; the number of variables finally chosen is
a constant as well. In fact, the use of multivariate polynomial multiplications and a small ring are
the main steps where our algorithm differs from earlier algorithms by Schönhage-Strassen and Fürer.

2
The use of inner and outer DFT plays a central role in both Fürer’s as well as our algo-
rithm. Towards understanding the notion of inner and outer DFT in the context of multivariate
polynomials, we present a group theoretic interpretation of Discrete Fourier Transform (DFT).
Arguing along the line of Fürer [Für07] we show that repeated use of efficient computation of inner
DFT’s using some special roots of unity in R makes the overall process efficient and leads to an

O(N · log N · 2O(log N ) ) time algorithm.

2 The Ring, the Prime and the Root of Unity


We work with the ring R = Z[α]/(pc , αm + 1) for some m, a constant c and a prime p. Elements
of R are thus m − 1 degree polynomials over α with coefficients from Z/pc Z. By construction, α is
a 2m-th root of unity and multiplication of any element in R by any power of α can be achieved
by shifting operations — this property is crucial in making some multiplications in the FFT less
costly (Section 4.2).
Given an N -bit number a, we encode it as a k-variate polynomial over R with degree in each
variable less than M . The parameters M and m are powers of two such that M k is roughly logN2 N
and m is roughly log N . The parameter k will ultimately be chosen a constant (see Section 5). We
now explain the details of this encoding.

2.1 Encoding Integers into multivariate Polynomials


N
Given an N -bit integer a, we first break these N bits into M k blocks of roughly Mk
bits each. This
N
corresponds to representing a in base q = 2 . Let a = a0 + . . . + aM k −1 q
Mk M k −1 where ai < q. The
number a is converted into a polynomial as follows:
1. Express i in base M as i = i1 + i2 M + · · · + ik M k−1 .
2. Encode each term ai q i as the monomial ai · X1i1 · · · Xkik . As a result, the number a gets
P k −1
converted to the polynomial M i=0 ai · X1i1 · · · Xkik .
Further, we break each ai into m 2 equal sized blocks where the number of bits in each block
is u = M k ·m . Each coefficient ai is then encoded as polynomial in α of degree less than m
2N
2 . The
polynomials are then padded with zeroes to stretch their degrees to m. Thus, the N -bit number a
is converted to a k-variate polynomial a(X) over Z[α]/(αm + 1).

Given integers a and b, each of N bits, we encode them as polynomials a(X) and b(X) and
s−1
compute the product polynomial. The product a · b can be recovered by substituting Xs = q M ,
for 1 ≤ s ≤ k, and α = 2u in the polynomial a(X) · b(X). The coefficients in the product
polynomial could be as large as M k · m · 22u and hence it is sufficient to do arithmetic modulo
pc where pc > M k · m · 22u . Therefore, a(X) can indeed be considered as a polynomial over
R = Z[α]/(pc , αm +1). Our choice of the prime p ensures that c is in fact a constant (see Section 5).

2.2 Choosing the prime


The prime p should be chosen such that the ring Z/pc Z has a principal 2M -th root of unity, which
is required for polynomial multiplication using FFT. A principal root of unity is defined as follows.

3
Definition 2.1. An n-th root of unity ζ ∈ R is said to be primitive if it generates a cyclic group
of order n under multiplication. P Furthermore, it is said to be principal if n is coprime to the
characteristic of R and ζ satisfies n−1 ij
i=0 ζ = 0 for all 0 < j < n.

In Z/pc Z, a 2M -th root of unity is principal if and only if 2M | p − 1 (see also Section 6). As a
result, we need to choose the prime p from the arithmetic progression {1 + i · 2M }i>0 , which is the
main bottleneck of our approach. We now explain how this bottleneck can be circumvented.
An upper bound for the least prime in an arithmetic progression is given by the following
theorem [Lin44]:

Theorem 2.2 (Linnik). There exist absolute constants ` and L such that for any pair of coprime
integers d and n, the least prime p such that p ≡ d mod n is less than `nL .

Heath-Brown  showed that the Linnik constant L ≤ 5.5. Recall that M is chosen such
 [HB92]
k N
that M is Θ log2 N . If we choose k = 1, that is if we use univariate polynomials to encode
 
integers, then the parameter M = Θ logN2 N . Hence the least prime p ≡ 1 (mod 2M ) could be as
large as N L . Since all known deterministic sieving procedures take at least N L time this is clearly
infeasible (for a randomized approach see Section 5.1). However, by choosing a larger k we can
ensure that the least prime p ≡ 1 (mod 2M ) is O(N ε ) for some constant ε < 1.
 L 
L
Remark 2.3. If k is any integer greater than L + 1, then M = O N L+1 and hence the least
prime p ≡ 1 mod 2M can be found in o(N ) time.

2.3 The Root of Unity


We require a principal 2M -th root of unity in R to compute the Fourier transforms. This root ρ(α)
should also have the property that its M m -th power is α, so as to make some multiplications in
the FFT efficient (Lemma 4.9). Such a root can be computed by interpolation in a way similar to
that in Fürer’s algorithm [Für07, Section 3], but we briefly sketch the procedure for completeness.
 
We first obtain a (p − 1)-th root of unity ζ in Z/pc Z by lifting a generator of F∗p . The p−1
2M -th

power of ζ gives us a 2M -th root of unity ω. A generator of Fp can be computed by brute force, as
p is sufficiently small. Having obtained a generator, we can use Hensel Lifting [NZM91, Theorem
2.23].

Lemma 2.4. Let ζs be a primitive (p − 1)-th root of unity in Z/ps Z. Then there exists a unique
primitive (p − 1)-th root of unity ζs+1 in Z/ps+1 Z such that ζs+1 ≡ ζs (mod ps ). This unique root
is given by ζs+1 = ζs − ff0(ζs)
(ζs ) where f (X) = X
p−1 − 1.

We need the following claims to compute the root ρ(α).

Claim 2.5. Let ω be a principal 2M -th root of unity in Z/pc Z.


M
(a) If σ = ω m , then σ is a principal 2m-th root of unity.

(b) The polynomial xm + 1 = m 2i−1 ) in Z/pc Z. Moreover, for any 0 ≤ i < j ≤ 2m, the
Q
i=1 (x − σ
ideals generated by (x − σ i ) and (x − σ j ) are comaximal in Z[x]/pc Z.

4
(c) The roots σ 2i−1 1≤i≤m are distinct modulo p and therefore the difference of any two of them


is a unit in Z[x]/pc Z.

We then, through interpolation, solve for a polynomial ρ(α) such that ρ(σ 2i+1 ) = ω 2i+1 for all
1 ≤ i ≤ m. Then,

ρ(σ 2i+1 ) = ω 2i+1 1 ≤ i ≤ m


M/m
=⇒ ρ(σ 2i+1 ) = ω (2i+1)M/m = σ 2i+1
=⇒ (ρ(α))M/m = α (mod α − σ 2i+1 ) 1≤i≤m
M/m m
=⇒ (ρ(α)) = α (mod α + 1)

The first two parts of the claim justify the Chinese Remaindering. Finally, computing a polynomial
ρ(α) such that ρ(σ 2i+1 ) = ω 2i+1 can be done through interpolation.
m Q 2j+1 )
j6=i (α − σ
X
ρ(α) = ω 2i+1 Q 2i+1 − σ 2j+1 )
i=1 j6=i (σ

The division by (σ 2i+1 − σ 2j+1 ) is justified as it is a unit in Z/pc Z (part (c) of Claim 2.5).

3 The Integer Multiplication Algorithm


We are given two integers a, b < 2N to multiply. We fix constants k and c whose values are given
in Section 5. The algorithm is as follows:
N
1. Choose M and m as powers of two such that M k ≈ log2 N
and m ≈ log N . Find the least
prime p ≡ 1 (mod 2M ) (Remark 2.3).

2. Encode the integers a and b as k-variate polynomials a(X) and b(X) respectively over the
ring R = Z[α]/(pc , αm + 1) (Section 2.1).

3. Compute the root ρ(α) (Section 2.3).

4. Use ρ(α) as the principal 2M -th root of unity to compute the Fourier transforms of the k-
variate polynomials a(X) and b(X). Multiply component-wise and take the inverse Fourier
transform to obtain the product polynomial.

5. Evaluate the product polynomial at appropriate powers of two to recover the integer product
and return it (Section 2.1).

The only missing piece is the Fourier transforms for multivariate polynomials. The following
section gives a group theoretic description of FFT.

4 Fourier Transform
A convenient way to study polynomial multiplication is to interpret it as multiplication in a group
algebra.

5
Definition 4.1 (Group PAlgebra). Let G be a group. The group algebra of G over a ring R is
the set of formal sums g∈G αg g where αg ∈ R with addition defined point-wise and multiplication
defined via convolution as follows
! !  
X X X X
αg g βh h =  αg βh  u
g h u gh=u

Multiplying univariate polynomials over R of degree less than n can be seen as multiplication
in the group algebra R[G] where G is the cyclic group of order 2n. Similarly, multiplying k-variate
polynomials of degree less than n in each variable can be seen as multiplying in the group algebra
R[Gk ], where Gk denotes the k-fold product group G × . . . × G.
In this section, we study the Fourier transform over the group algebra R[E] where E is an
additive abelian group. Most of this, albeit in a different form, is well known but is provided here
for completeness. [Sha99, Chapter 17]
In order to simplify our presentation, we will fix the base ring to be C, the field of complex
numbers. Let n be the exponent of E, that is the maximum order of any element in E. A similar
approach can be followed for any other base ring as long as it has a principal n-th root of unity.
We consider C[E] as a vector space with basis {x}x∈E and use the Dirac notation to represent
elements of C[E] — the vector |xi, x in E, denotes the element 1.x of C[E].

Definition 4.2 (Characters). Let E be an additive abelian group. A character of E is a homo-


morphism from E to C∗ .

An example of a character of E is the trivial character, which we will denote by 1, that assigns
to every element of E the complex number 1. If χ1 and χ2 are two characters of E then their
product χ1 .χ2 is defined as χ1 .χ2 (x) = χ1 (x)χ2 (x).

Proposition 4.3. [Sha99, Chapter 17, Theorem 1] Let E be an additive abelian group of exponent
n. Then the values taken by any character of E are n-th roots of unity. Furthermore, the characters
form a multiplicative abelian group Ê which is isomorphic to E.

An important property that the characters satisfy is the following [Isa94, Corollary 2.14].

Proposition 4.4 (Schur’s Orthogonality). Let E be an additive abelian group. Then


(
X 0 if χ 6= 1,
χ(x) =
x∈E
#E otherwise
(
X 0 if x 6= 0,
χ(x) =
#E otherwise.
χ∈Ê
P
It follows from Schur’s orthogonality that the collection of vectors |χi = x χ(x) |xi forms a
basis of C[E]. We will call this basis the Fourier basis of C[E].

Definition 4.5 (Fourier Transform). Let E be an additive abelian group and let x 7→ χx be an
isomorphism between E and Ê. The Fourier transform over E is the linear map from C[E] to C[E]
that sends |xi to |χx i.

6
Thus, the Fourier transform is a change of basis from the point basis {|xi}x∈E to the Fourier
basis {|χx i}x∈E . The Fourier transform is unique only up to the choice of the isomorphism x 7→ χx .
This isomorphism is determined by the choice of the principal root of unity.

Remark 4.6. Given an element |f i ∈ C[E], to compute its Fourier transform it is sufficient to
compute the Fourier coefficients {hχ|f i}χ∈Ê .

4.1 Fast Fourier Transform


We now describe the Fast Fourier Transform for general abelian groups in the character theoretic
setting. For the rest of the section fix an additive abelian group E over which we would like to
compute the Fourier transform. Let A be any subgroup of E and let B = E/A. For any such pair
of abelian groups A and B, we have an appropriate Fast Fourier transformation, which we describe
in the rest of the section.

Proposition 4.7. 1. Every character λ of B can be “lifted” to a character of E (which will also
be denoted by λ) defined as follows λ(x) = λ(x + A).

2. Let χ1 and χ2 be two characters of E that when restricted to A are identical. Then χ1 = χ2 λ
for some character λ of B.

3. The group B̂ is (isomorphic to) a subgroup of Ê with the quotient group Ê/B̂ being (isomor-
phic to) Â.
P
We now consider the task of computing the Fourier transform of an element |f i = fx |xi
presented as a list of coefficients {fx } in the point basis. For this, it is sufficient to compute the
Fourier coefficients {hχ|f i} for each character χ of E (Remark 4.6). To describe the Fast Fourier
transform we fix two sets of cosets representatives, one of A in E and one of B̂ in Ê as follows.

1. For each b ∈ B, b being a coset of A, fix a coset representative xb ∈ E such b = xb + A.

2. For each character ϕ of A, fix a character χϕ of E such that χϕ restricted to A is the character
ϕ. The characters {χϕ } form (can be thought of as) a set of coset representatives of B̂ in Ê.

P {xb }b∈B forms a set of coset representatives, any |f i ∈ C[E] can be written uniquely as
Since
|f i = fb,a |xb + ai.
P
Proposition 4.8. Let |f i = fb,a |xb + ai be an element of C[E]. For each b ∈ B and ϕ ∈ Â let
|fb i ∈ C[A] and |fϕ i ∈ C[B] be defined as follows.
X
|fb i = fb,a |ai
a∈A
X
|fϕ i = χϕ (xb )hϕ|fb i |bi
b∈B

Then for any character χ of E, which can be expressed as χ = λ · χϕ , the Fourier coefficient
hχ|f i = hλ|fϕ i.

7
Proof. Recall that λ(x + A) = λ(x), and ϕ is a restriction of the χ to the subgroup A.
XX
hχ|f i = χ(xb + a)fb,a
b a
X X
= λ(xb + a) χϕ (xb + a)fb,a
b a
X X
= λ(b)χϕ (xb ) ϕ(a)fb,a
b a
X
= λ(b)χϕ (xb )hϕ|fb i
b
= hλ|fϕ i

P
We are now ready to describe the Fast Fourier transform given an element |f i = fx |xi.
1. For each b ∈ B compute the Fourier transforms of |fb i. This requires #B many Fourier
transforms over A.
2. As a result of the previous step we have for each b ∈ P B and ϕ ∈ Â the Fourier coeffi-
cients hϕ|fb i. Compute for each ϕ the vectors |fϕ i = b∈B χϕ (xb )hϕ|fb i |bi. This requires
#Â.#B = #E many multiplications by roots of unity.
3. For each ϕ ∈ Â compute the Fourier transform of |fϕ i. This requires #Â = #A many Fourier
transforms over B.
4. Any character χ of E is of the form χϕ λ for some ϕ ∈ Â and λ ∈ B̂. Using Proposition 4.8
we have at the end of Step 3 all the Fourier coefficients hχ|f i = hλ|fϕ i.
If the quotient group B itself has a subgroup that is isomorphic to A then we can apply this
process recursively on B to obtain a divide and conquer procedure to compute the Fourier transform.
In the standard FFT we use E = Z/2n Z. The subgroup A is 2n−1 E which is isomorphic to Z/2Z
and the quotient group B is Z/2n−1 Z.

4.2 Analysis of the Fourier Transform


Our goal is to multiply k-variate polynomials over R, with the degree in each variable less than
M . This can be achieved by embedding the polynomials into the algebra of the product group
k
E = 2MZ·Z and multiplying them as elements of the algebra. Since the exponent of E is 2M , we
require a principal 2M -th root of unity in the ring R. We shall use the root ρ(α) (as defined in
Section 2.3) for the Fourier transform over E.
Z
k
For every subgroup A of E, we have a corresponding FFT. We choose the subgroup A as 2m·Z
and let B be the quotient group E/A. The group A has exponent 2m and α is a principal 2m-th
root of unity. Since α is a power of ρ(α), we can use it for the Fourier transform over A. As
multiplications by powers of α are just shifts, this makes Fourier transform over A efficient.
k
Let F(2M, k) denote the complexity of computing the Fourier transform over 2MZ·Z . We
have  k  
M k k M
F(2M, k) = F(2m, k) + M MR + (2m) F ,k (1)
m m

8
where MR denotes the complexity of multiplications in R. The first term comes from the #B many
Fourier transforms over A (Step 1 of FFT), the second term corresponds to the multiplications by
roots of unity (Step 2) and the last term comes from the #A many Fourier transforms over B
(Step 3).
Since A is a subgroup of B as well, Fourier transforms over B can be recursively computed in a
similar way, with B playing the role of E. Therefore, by simplifying the recurrence in Equation 1
we get:  k
M k log M

M log M
F(2M, k) = O F(2m, k) + MR (2)
mk log m log m

Lemma 4.9. F(2m, k) = O(mk+1 log m · log p)

Proof. The FFT over a group of size n is usually done by taking 2-point FFT’s followed by n2 -point
FFT’s. This involves O(n log n) multiplications by roots of unity and additions in base ring. Using
this method, Fourier transforms over A can be computed with O(mk log m) multiplications and
additions in R. Since each multiplication is between an element of R and a power of α, this can
be efficiently achieved through shifting operations. This is dominated by the addition operation,
which takes O(m log p) time, since this involves adding m coefficients from Z/pc Z.

Therefore, from Equation 2,

M k log M
 
k
F(2M, k) = O M log M · m · log p + MR (3)
log m

5 Complexity Analysis
The choice of parameters should ensure that the following constraints are satisfied:
 
1. M k = Θ logN2 N and m = O(log N ).

2. M L = O(N ε ) where L is the Linnik constant (Theorem 2.2) and ε is any constant less than
1. Recall that this makes picking the prime by brute force feasible (see Remark 2.3).
2N
3. pc > M k · m · 22u where u = M km
. This is to prevent overflows during modular arithmetic
(see Section 2.1).

It is straightforward to check that k > L + 1 and c > 5(k + 1) satisfy the above constraints. Heath-
Brown [HB92] showed that L ≤ 5.5 and therefore c = 42 clearly suffices.

Let T (N ) denote the time complexity of multiplying two N bit integers. This consists of:

(a) Time required to pick a suitable prime p,

(b) Computing the root ρ(α),

(c) Encoding the input integers as polynomials,

(d) Multiplying the encoded polynomials,

9
(e) Evaluating the product polynomial.

As argued before, the prime p can be chosen in o(N ) time. To compute ρ(α), we need to lift
a generator of F∗p to Z/pc Z followed by an interpolation. Since c is a constant and p is a prime of
O(log N ) bits, the time required for Hensel Lifting and interpolation is o(N ).
The encoding involves dividing bits into smaller blocks, and expressing the exponents of q in
base M (Section 2.1) and all these take O(N ) time since M is a power of 2. Similarly, evaluation
of the product polynomial takes linear time as well. Therefore, the time complexity is dominated
by the time taken for polynomial multiplication.

Time complexity of Polynomial Multiplication


From Equation 3, the complexity of Fourier transform is given by

M k log M
 
k
F(2M, k) = O M log M · m · log p + MR
log m

Proposition 5.1. [Sch82] Multiplication in the ring R reduces to multiplying O(log2 N ) bit integers
2

and therefore MR = T O(log N ) .

Proof. Elements of R can be seen as polynomials in α over Z/pc Z with degree at most m. Given
two such polynomials f (α) and g(α), encode them as follows: Replace α by 2d , transforming the
polynomials f (α) and g(α) to the integers f (2d ) and g(2d ) respectively. The parameter d is chosen
such that the coefficients of the product h(α) = f (α)g(α) can be recovered from the product
f (2d ) · g(2d ). For this, it is sufficient to ensure that the maximum coefficient of h(α) is less than
2d . Since f and g are polynomials of degree m, we would want 2d to be greater than m · p2c , which
can be ensured by choosing d = Θ (log N ). The integers f (2d ) and g(2d ) are bounded by 2md and
hence the task of multiplying in R reduces to O(log2 N ) bit integer multiplication.

Multiplication of two polynomials involve a Fourier transform followed by component-wise mul-


tiplications and an inverse Fourier transform. Since the number of component-wise multiplications
is only M k , the time taken is M k · MR which is clearly subsumed in F(M, k). Therefore, the
time taken for multiplying the polynomials is O(F(M, k)). Thus, the complexity of our integer
multiplication algorithm T (N ) is given by,

T (N ) = O(F(M, k))
M k log M
 
k
= O M log M · m · log p + MR
log m
 
N
= O N log N + T (O(log2 N ))
log N · log log N

The above recurrence leads to the following theorem.



Theorem 5.2. Given two N bit integers, their product can be computed in O(N · log N · 2O(log N ))

time.

10
5.1 Choosing the Prime Randomly
To ensure that the search for a prime p ≡ 1 (mod 2M ) does not affect the overall time complexity
of the algorithm, we considered multivariate polynomials to restrict the value of M ; an alternative
is to use randomization.

Proposition 5.3. Assuming ERH, a prime p such that p ≡ 1 (mod 2M ) can be computed by a
randomized algorithm with expected running time Õ(log3 M ).

Proof. Titchmarsh [Tit30] (see also Tianxin [Tia90]) showed, assuming ERH, that the number of
primes less than x in the arithmetic progression {1 + i · 2M }i>0 is given by,

Li(x) √
π(x, 2M ) = + O( x log x)
ϕ(2M )

for 2M ≤ x·(log x)−2 , where Li(x) = Θ( logx x ) and ϕ is the Euler totient function. In our case, since
 
M is a power of two, ϕ(2M ) = M , and hence for x ≥ 4M 2 ·log6 M , we have π(x, 2M ) = Ω M log x
x .
Therefore, for an i chosen uniformly randomly in the range 1 ≤ i ≤ 2M ·log6 M , the probability that
i · 2M + 1 is a prime is at least logd x for a constant d. Furthermore, primality test of an O(log M ) bit
number can be done in Õ(log2 M ) time using Rabin-Miller primality test [Mil76, Rab80]. Hence,
with x = 4M 2 · log6 M a suitable prime for our algorithm can be found in expected Õ(log3 M )
time.

6 A Different Perspective
Our algorithm can be seen as a p-adic version of Fürer’s integer multiplication algorithm, where the
field C is replaced by Qp , the field of p-adic numbers (for a quick introduction, see Baker’s online
notes [Bak07]). Much like C, where representing a general element (say in base 2) takes infinitely
many bits, representing an element in Qp takes infinitely many p-adic digits. Since we cannot work
with infinitely many digits, all arithmetic has to be done with finite precision. Modular arithmetic
in the base ring Z[α]/(pc , αm + 1), can be viewed as arithmetic in the ring Qp [α]/(αm + 1) keeping
a precision of ε = p−c .
Arithmetic with finite precision naturally introduces some errors in computation. However, the
nature of Qp makes the error analysis simpler. The field Qp comes with a norm | · |p called the p-adic
 
norm, which satisfies the stronger triangle inequality |x + y|p ≤ max |x|p , |y|p [Bak07, Proposi-
tion 2.6]. As a result, unlike in C, the errors in computation do not compound.

Recall that the efficiency of FFT crucially depends on a special principal 2M -th root of unity in
Qp [α]/(αm + 1). Such a root is constructed with the help of a primitive 2M -th root of unity in Qp .
The field Qp has a primitive 2M -th root of unity if and only if 2M divides p − 1 [Bak07, Theorem
5.12]. Also, if 2M divides p − 1, a 2M -th root can be obtained from a (p − 1)-th root of unity by
taking a suitable power. A primitive (p − 1)-th root of unity in Qp can be constructed, to sufficient
precision, using Hensel Lifting starting from a generator of F∗p .

11
7 Conclusion
There are two approaches for multiplying integers, one using arithmetic over complex numbers, and
the other using modular arithmetic. Using complex numbers, Schönhage and Strassen [SS71] gave

an O(N · log N · log log N . . . 2O(log N ) ) algorithm. Fürer [Für07] improved this complexity to O(N ·

log N ·2O(log N ) ) using some special roots of unity. The other approach, that is modular arithmetic,
can be seen as arithmetic in Qp with certain precision. A direct adaptation of the Schönhage-

Strassen’s algorithm in the modular setting leads to an O(N ·log N ·log log N . . . 2O(log N ) ) algorithm.
In this paper, we showed that by choosing an appropriate prime and a special root of unity, a running

time of O(N · log N · 2O(log N ) ) can be achieved through modular arithmetic as well. Therefore, in
a way, we have unified the two paradigms.

Acknowledgement
We thank V. Vinay, Srikanth Srinivasan and the anonymous referees for many helpful suggestions
that improved the overall presentation of this paper.

References
[Bak07] Alan J. Baker. An introduction to p-adic numbers and p-adic analysis. Online Notes,
2007. http://www.maths.gla.ac.uk/∼ajb/dvi-ps/padicnotes.pdf.

[Für07] Martin Fürer. Faster integer multiplication. Proceedings of the 39th ACM Symposium on
Theory of Computing, pages 57–66, 2007.

[HB92] D. R. Heath-Brown. Zero-free regions for Dirichlet L-functions, and the least prime in an
arithmetic progression. In Proceedings of the London Mathematical Society, 64(3), pages
265–338, 1992.

[Isa94] I. Martin Isaacs. Character theory of finite groups. Dover publications Inc., New York,
1994.

[KO63] A Karatsuba and Y Ofman. Multipication of multidigit numbers on automata. English


Translation in Soviet Physics Doklady, 7:595–596, 1963.

[Lin44] Yuri V. Linnik. On the least prime in an arithmetic progression, I. The basic theorem,
II. The Deuring-Heilbronn’s phenomenon. Rec. Math. (Mat. Sbornik), 15:139–178 and
347–368, 1944.

[Mil76] G. L. Miller. Riemann’s hypothesis and tests for primality. Journal of Computer and
System Sciences, 13:300–317, 1976.

[NZM91] Ivan Niven, Herbert S. Zuckerman, and Hugh L. Montgomery. An Introduction to the
Theory of Numbers. John Wiley and Sons, Singapore, 1991.

[Rab80] Michael O. Rabin. Probabilistic algorithm for testing primality. Journal of Number
Theory, 12:128–138, 1980.

12
[Sch82] Arnold Schönhage. Asymptotically fast algorithms for the numerical multiplication and
division of polynomials with complex coefficients. In Computer Algebra, EUROCAM,
volume 144 of Lecture Notes in Computer Science, pages 3–15, 1982.

[Sha99] Igor R. Shafarevich. Basic Notions of Algebra. Springer Verlag, USA, 1999.

[SS71] A Schönhage and V Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–
292, 1971.

[Tia90] Cai Tianxin. Primes representable by polynomials and the lower bound of the least primes
in arithmetic progressions. Acta Mathematica Sinica, New Series, 6:289–296, 1990.

[Tit30] E. C. Titchmarsh. A divisor problem. Rend. Circ. Mat. Palerme, 54:414–429, 1930.

[Too63] A L. Toom. The complexity of a scheme of functional elements simulating the multipli-
cation of integers. English Translation in Soviet Mathematics, 3:714–716, 1963.

13

You might also like