0% found this document useful (0 votes)
174 views

Pub Factorization Unique and Otherwise

Álgebra Superior

Uploaded by

martin 80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views

Pub Factorization Unique and Otherwise

Álgebra Superior

Uploaded by

martin 80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 262

i i

i i

Factorization
Unique and Otherwise

i i

i i
i i

i i

CMS Treatises in Mathematics


Published by the Canadian Mathematical Society

Traités de mathématiques de la SMC


Publié par la Société mathématique du Canada

Editorial Board/Conseil de rédaction


James G. Arthur
Ivar Ekeland
Arvind Gupta
Barbara Lee Keyfitz
François Lalonde

CMS Associate Publisher/Éditeur associé


Jonathan Borwein

i i

i i
i i

i i

Factorization
Unique and Otherwise

Steven H. Weintraub
Lehigh University

Canadian Mathematical Society A K Peters, Ltd.


Société mathématique du Canada Wellesley, Massachusetts
Ottawa, Ontario

i i

i i
i i

i i

Sales and Customer Service CMS Executive Office


Bureau administratif de la SMC
A K Peters, Ltd. Canadian Mathematical Society
888 Worcester Street, Suite 230 Société mathématique du Canada
Wellesley, MA 02482 577 King Edward
www.akpeters.com Ottawa, Ontario
Canada K1N 6N5
www.cms.math.ca/Publications

Copyright 
c 2008 A K Peters, Ltd.

All rights reserved. No part of the material protected by this copyright notice
may be reproduced or utilized in any form, electronic or mechanical, including
photocopying, recording, or by any information storage and retrieval system,
without written permission from the copyright owner.

Tous droits réservés. Il est interdit de reproduire ou d’utiliser le matériel protégé


par cet avis de droit d’auteur, sous quelque forme que ce soit, numérique ou
mécanique, notamment de l’enregistrer, de le photocopier ou de l’emmagasiner
dans un système de sauvegarde et de récupération de l’information, sans la per-
mission écrite du titulaire du droit d’auteur.

Library of Congress Cataloging-in-Publication Data

Weintraub, Steven H.
Factorization : unique and otherwise / Morgens Esrom Larsen.
p. cm. -- (CMS Treatises in mathematics)
Includes index.
ISBN 978-1-56881-241-0 (alk. paper)
1. Factorization (Mathematics). 2. Rings of integers. 3. Rings
(Algebra). I. Title.

QA161.F3W45 2008
512.7 2--dc22
2007049328

Printed in Canada
12 11 10 09 08 10 9 8 7 6 5 4 3 2 1

i i

i i
i i

i i

To my nephews, nieces, and grandkids:

Wendy, Jenny, and William;


Erica, Jordan, and Allison;
Blake, Natalie, and Ethan

i i

i i
i i

i i

Contents

Preface ix

Introduction 1

1 Basic Notions 7
1.1 Integral Domains . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Quadratic Fields . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Unique Factorization 19
2.1 Euclidean Domains . . . . . . . . . . . . . . . . . . . . . . 20
2.2 The GCD-L Property and Euclid’s Algorithm . . . . . . . 31
2.3 Ideals and Principal Ideal Domains . . . . . . . . . . . . . 45
2.4 Unique Factorization Domains . . . . . . . . . . . . . . . 51
2.5 Nonunique Factorization: The Case D < 0 . . . . . . . . . 60
2.6 Nonunique Factorization: The Case D > 0 . . . . . . . . . 67
2.7 Summing Up . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3 The Gaussian Integers 91


3.1 Fermat’s Theorem . . . . . . . . . . . . . . . . . . . . . . 92
3.2 Factorization into Primes . . . . . . . . . . . . . . . . . . 101
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4 Pell’s Equation 111


4.1 Representations and Their Composition . . . . . . . . . . 112
4.2 Solving Pell’s Equation . . . . . . . . . . . . . . . . . . . . 118
4.3 Numerical Examples and Further Results . . . . . . . . . 127

4.4 Units in O( D) . . . . . . . . . . . . . . . . . . . . . . . 137
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

vii

i i

i i
i i

i i

viii Contents

5 Towards Algebraic Number Theory 143


5.1 Algebraic Numbers and Algebraic Integers . . . . . . . . . 144
5.2 Ideal Theory . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3 Dedekind Domains . . . . . . . . . . . . . . . . . . . . . . 150
5.4 Algebraic Number √ Fields and Dedekind Domains . . . . . 154
5.5 Prime Ideals in O( D) √ . . . . . . . . . . . . . . . . . . . 158
5.6 Examples of Ideals in O( D) . . . . . . . . . . . . . . . . 166
5.7 Behavior of Ideals in Algebraic Number Fields . . . . . . 178
5.8 Ideal Elements . . . . . . . . . . . . . . . . . . . . . . . . 180
5.9 Dirichlet’s Unit Theorem . . . . . . . . . . . . . . . . . . 182
5.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

A Mathematical Induction 191


A.1 Mathematical Induction and Its Equivalents . . . . . . . . 191
A.2 Consequences of Mathematical Induction . . . . . . . . . 196
A.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

B Congruences 205
B.1 The Notion of Congruence . . . . . . . . . . . . . . . . . . 205
B.2 Linear Congruences . . . . . . . . . . . . . . . . . . . . . . 211
B.3 Quadratic Congruences . . . . . . . . . . . . . . . . . . . 223
B.4 Proof of the Law of Quadratic Reciprocity . . . . . . . . . 236
B.5 Primitive Roots . . . . . . . . . . . . . . . . . . . . . . . . 241
B.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

C Continuations from Chapter 2 251


C.1 Continuation of the Proof of Theorem 2.8 . . . . . . . . . 251
C.2 Continuation of Example 2.26 . . . . . . . . . . . . . . . . 255
C.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

Index 259

i i

i i
i i

i i

Preface

In this book, we introduce the reader to some beautiful and interesting


mathematics, which is not only historically important but also still very
much alive today. Indeed, it plays a central role in modern mathematics.
The mathematical content of this book is outlined in the introduction,
but we shall preview it here. It is a basic property of the integers, known
as the Fundamental Theorem of Arithmetic, that every integer can be fac-
tored into a product of primes in an essentially unique way. Our principal
objective in this book is to investigate somewhat more general but still
relatively concrete systems (known as rings of integers in quadratic fields)
and see when this property does or does not hold for them. We accomplish
this objective in Chapters 1 and 2. But this investigation naturally leads
us into further investigations—mathematics is like that—and we consider
related questions in Chapters 3 and 4, where we investigate the Gaussian
integers and Pell’s equation, respectively.
The questions we investigate here were at the roots of the development
of algebraic number theory. In Chapter 5 we provide an overview of alge-
braic number theory with emphasis on how the results for quadratic fields
generalize to arbitrary algebraic number fields.
We envision several ways in which this book can be used. One way
is for a first course in number theory. In our investigations, we begin at
the beginning, so this book is suitable for that purpose. Indeed, one of
the themes of this book is that one can go a long ways with only ele-
mentary methods. To be sure, the topics covered here are not the tradi-
tional topics for a first course in number theory (though there is consid-
erable overlap), but there is no reason that the traditional topics need be
sacrosanct.
Another way to use this book is for a more advanced course in number
theory, and there is plenty of appropriate material here for such a course.
Indeed, there is far more than a semester’s worth of material here, even for
an advanced course.

ix

i i

i i
i i

i i

x Preface

In this regard, we call the reader’s attention to Appendices A and B, on


mathematical induction and congruences, respectively. If this book is used
as a text for a first course, much of the material in these two appendices
should be covered. If this book is used as a text for a more advanced course,
these appendices will serve as background.
We have not tried to write a textbook on algebraic number theory in
Chapter 5, but rather to provide an overview of the field. But we feel that
this overview can serve as a valuable introduction to, and guide for, the
student who wishes to study this field, and can also serve as a concrete
reference for some of the general results that a student of this field will
encounter.

Steven H. Weintraub
Bethlehem, PA, USA
August 2007

i i

i i
i i

i i

Introduction

We shall here be concerned with the circle of ideas that surrounds the
Fundamental Theorem of Arithmetic.
First we recall the usual definition of a prime: a prime number is a
positive integer, other than 1, that has no divisors except itself and 1. For
example 2 and 3 are primes, but 6 = 2 · 3 and 10 = 2 · 5 are not.
Then the Fundamental Theorem of Arithmetic states that every posi-
tive integer can be factored into primes in an essentially unique way. For
example,

1 = 1,
2 = 2,
6 = 2 · 3,
10 = 2 · 5,
15 = 3 · 5,
2499 = 3 · 72 · 17.

By “essentially unique,” we mean unique up to the order of the factors,


so that we consider 6 = 2 · 3 = 3 · 2 to be the same factorization. (Note
that 1 is a special case. We think of it as having an “empty” factorization,
as it is not divisible by any prime.)
As its name implies, unique factorization is a fundamental property of
the positive integers, a property that was known to the ancient Greeks. We
will prove this property, and indeed our proof will follow that of Euclid.
But we will be interested in examining this proof and seeing what makes
it really “work,” with an idea of seeing when we can extend it to more
general situations.

For example, let us consider numbers of the form a + b −1 with a and
b integers. It turns out, and we shall prove, that numbers of this form also
have unique factorization. For example, we have the following factorization

i i

i i
i i

i i

2 Introduction

into primes for numbers of this form:

3 = 3,
√ √
5 = (2 + −1)(2 − −1),
7 = 7,
11 = 11,
√ √
13 = (3 + 2 −1)(3 − 2 −1),
√ √
17 = (4 + −1)(4 − −1).

On the other hand, let us consider numbers of the form a + b −5 with
a and b integers. Numbers of this form do not have unique factorization.
For example, we have the following two factorizations of 6 into irreducibles:
√ √
6 = (2)(3) = (1 + −5)(1 − −5).

We can also consider numbers of the form a+b 10 with a and b integers.
Numbers of this form also do not have unique factorization. For example,
we have the following two factorizations of 10 into irreducibles:

14 = (2)(5) = ( 10)2 .

We have used the word “irreducible” rather than “prime” here as that turns
out to be the correct mathematical language.
In fact, we will prove the Fundamental Theorem of Arithmetic in a way
that enables us to establish it in many cases, including the two we have

mentioned—the ordinary integers, and numbers of the form a + b −1 with
a and b integers—simultaneously.
On the other hand, we will also be able to systematically show that in
many cases, including the two we have mentioned—numbers of √ the form

a + b −5 with a and b integers, and numbers of the form a + b 10 with a
and b integers—unique factorization does not hold.
As we will see, instead of unique factorization being the norm and non-
unique factorization the exception, the situation is reversed! It is really
a very special property, though a crucially important one, of the ordinary
integers that the Fundamental Theorem of Arithmetic holds for them.
Chapters 1 and 2 of this book are basically devoted to proving unique
and nonunique
√ factorization for ordinary integers and for numbers of the
form a + b D. (Here a and b are not always integers, but this is a technical
point we will defer until later.)

In Chapter 3, we investigate numbers of the form a + b −1 with a and
b integers. Numbers of this form are called the Gaussian integers. As we

i i

i i
i i

i i

Introduction 3

have remarked, in the Gaussian integers we do have unique factorization


into primes, but we would like to know what the primes are. Here we
will show that the following is always true (compare the factorizations
above): every ordinary prime that leaves a remainder of 3 when divided by
4 remains a prime in the Gaussian integers, but every prime that leaves a
remainder of 1 when divided by 4 factors into a product of two “conjugate”
primes in the Gaussian integers. In fact, this is closely related to a famous
theorem of Fermat: every prime that leaves a remainder of 1 when divided
by 4 can be written as a sum of two squares. (For example, 5 = 22 + 12 ,
13 = 32 + 22 , 17 = 42 + 12 , but also 97 = 82 + 52 , 101 = 102 + 12 , and
99989 = 2302 + 2172 .)
Actually, we will give several proofs of this theorem. One, due to Euler,
is believed to be along the lines of Fermat’s original proof (which he never
wrote down for posterity). It uses a technique known as composition. An-
other one uses unique factorization to prove Fermat’s theorem quickly and
easily. (It is a bit surprising that this abstract result gives such a concrete
fact, but mathematics is full of beautiful surprises.)
To describe our next objective, we have to get a bit more technical.
The ancient Greeks considered the positive integers, but when we wish to
generalize our investigations, we no longer have the idea of positivity. (We

cannot make any sense out of saying that a + b −1 is positive.) So we
have to consider all integers. But when we do, we see that we have typical
factorizations:

6 = (2)(3) = (−1)(−2)(3) = (−2)(−3).

These are different factorizations, but we do not want to consider these


to be essentially different. How do they differ? The answer is that 1 can
be factored as 1 = (−1)(−1) and we are simply distributing the factors
of 1 differently. We give a name to this situation. Factors of 1 are called
units, and factorizations that differ merely because we have redistributed
the units are essentially the same.
We can show that the units in the Gaussian integers are precisely 1,
√ √
−1, −1, and − −1. Then we also have the prime factorization in the
Gaussian integers: √ √
2 = (− −1)(1 + −1)2 .
Here the first factor is a unit, so what we see is that, up to a unit factor,
2 is a square in the Gaussian integers.
In factoring numbers, we do not really care about units, but still it is
an interesting question—indeed, a very interesting question—to ask what

i i

i i
i i

i i

4 Introduction

the units are. We have given the answer for the Gaussian integers, but we
can ask the same question in other
√ cases as well. Here we ask the question
for numbers of the form a + b D for D positive and not a perfect square.
If D = 2 we have units
√ √
1 = (3 + 2 2)(3 − 2 2),
√ √
1 = (17 + 12 2)(17 − 12 2),
√ √
1 = (99 + 70 2)(99 − 70 2),
√ √
1 = (577 + 408 2)(577 − 408 2).
√ √
Note that a factorization 1 = (a + b D)(a − b D) gives a solution of
the equation a2 − b2 D = 1, and vice versa. Thus the search for units is
intimately related to the search for solutions of the equation a2 − b2 D = 1.
The units above correspond to solutions for D = 2: 1 = 32 − 22 · 2 =
172 − 122 · 2 = 992 − 702 · 2 = 5772 − 4082 · 2. But we can consider this
equation for other values of D as well. For example, for D = 61 we have
the solution
1 = (1766319049)2 − (226153980)2 · 61
and for D = 109 we have the solution

1 = (158070671986249)2 − (15140424455100)2 · 109.

In fact, for any such D there are infinitely many solutions (and hence
infinitely many units). We shall prove this in Chapter 4 where we inves-
tigate the equation a2 − b2 D = 1, known as Pell’s equation. Our proof
is a variant of the cakravala method experimentally developed by Indian
mathematicians between the ninth and twelfth centuries. This is also a
result known to Fermat, and his proof of this result may well have been
along the lines of ours, as our proof uses a method of “composition” very
closely related to our method in Chapter 3. Also, our proof is constructive,
enabling us to find solutions by hand for values of D that are not too large.
The above solutions for D = 61 and for D = 109 were known to have been
found by Fermat (by hand, obviously, since computers did not exist in the
seventeenth century).
Our investigations in Chapters 1 through 4 can be considerably gener-
alized. To use the appropriate technical language, in these chapters we are
considering quadratic fields, and we can consider analogous problems for
algebraic number fields. Indeed, our treatment here parallels the historical
development of the subject. Quadratic fields were investigated first, and
the phenomena that arose there motivated the development of the general

i i

i i
i i

i i

Introduction 5

theory. This subject is known as algebraic number theory. In Chapter 5


we survey some of the highlights of this subject. As we have seen, unique
factorization of elements holds in the integers Z, but it does not always
hold. While unique factorization of elements is the most straightforward
generalization of the situation in the integers, it is not the right generaliza-
tion. The right generalization is unique factorization of ideals, which does
hold. Therefore in Chapter 5 we will focus (though not exclusively) on ide-
als in general. But we will also provide a wealth of examples, interesting
in themselves, that show how quadratic fields fit into the general case. For
a precise description of the scope of our investigations in this chapter, we
simply refer the reader to the table of contents.
We have three appendices. Appendix A is a careful treatment of mathe-
matical induction, an essential proof technique. Appendix B is a treatment
of congruences. Here we begin with the definition, and proceed through lin-
ear congruences (including the Chinese Remainder Theorem) and quadratic
congruences (including the Law of Quadratic Reciprocity). Appendix C is
a technical one, dealing with some of the more complicated cases of results
in Chapter 2.

i i

i i
i i

i i

Chapter 1

Basic Notions

In this chapter we introduce the objects we will be studying. First we


introduce the general class of objects, and then we focus on the particular
examples that will concern us.

1.1 Integral Domains


We begin by carefully defining the class of objects we shall be studying.
Definition 1.1. An integral domain is a set R equipped with two operations,
addition and multiplication, that satisfy the following properties:
(1) R is closed under addition, i.e., if a and b are any two elements of R,
then a + b is an element of R.

(2) Addition is commutative, i.e., if a and b are any two of R, then a + b =


b + a.
(3) Addition is associative, i.e., if a, b, and c are any elements of R, then
(a + b) + c = a + (b + c).
(4) There is an additive identity 0 in R, i.e., there is an element 0 of R
with the property that for any element a of 0 + a = a + 0 = a.

(5) Every element of R has an additive inverse, i.e., if a is any element of


R, there is an element −a of R with the a + (−a) = (−a) + a = 0.

(6) R is closed under multiplication, i.e., if a and b are any two elements
of R, then ab is an element of R.
(7) Multiplication is commutative, i.e., if a and b are any two of R, then
ab = ba.

i i

i i
i i

i i

8 1. Basic Notions

(8) Multiplication is associative, i.e., if a, b, and c are three elements of R,


then (ab)c = a(bc).

(9) There is a multiplicative identity 1 in R, i.e., there is an element 1 of


R with the property that for any element of R, 1a = a1 = a.

(10) Multiplication distributes over addition, i.e., if a, b, and c are any three
elements of R, then a(b + c) = ab + ac and (b + c)a = ba + ca.

(11) R has no zero divisors, i.e., if a and b are any two non-zero elements
of R, then their product ab is also non-zero. Note that, by taking the
contrapositive, this condition may be equivalently rephrased as follows:
if a and b are any two elements of R with ab = 0, then a = 0 or b = 0.

An important property of integral domains is the cancellation property,


which holds for both addition and multiplication.

Lemma 1.2. Let R be an integral domain and let a, b, and c be elements


of R.

(1) If a + b = a + c then b = c.

(2) If a = 0 and ab = ac then b = c.

Proof:

(1) a+b=a+c
−a + (a + b) = −a + (a + c)
(−a + a) + b = (−a + a) + c
0+b = 0+c
b=c

(2) ab = ac
ab + a(−c) = ac + a(−c)
a(b − c) = a(c − c)
a(b − c) = a0
a(b − c) = 0

But by property (11), a(b − c) = 0 implies a = 0 or b − c = 0. Since we


are assuming a = 0, we must have b − c = 0, so b = c. 

i i

i i
i i

i i

1.1. Integral Domains 9

Here are some examples of integral domains.

Example 1.3.

(1) The ordinary integers Z form an integral domain. (Indeed, the term
“integral domain” has its origin in this fact.)

(2) The rational numbers Q form an integral domain. Q is just the set
of fractions {a/b} with a and b integers, with the usual addition and
multiplication of fractions. (Note that Q includes Z, as the integer a
is equal to the fraction a/1.)

(3) Fix an integer D that is not a perfect square, and let



R = {a + b D | a and b are integers }.

Then R is an integral
√ domain. Let us√examine R a little more carefully.
First, if a = a + b D and β = c + d D are in R, then
√ √ √
α + β = (a + b D) + (c + d D) = (a + b) + (c + d) D

is in R, and
√ √ √
αβ = (a + b D)(c + d D) = (ac + bdD) + (ad + bc) D

is in R. The remaining properties of R follow directly from the corre-


sponding properties of Z, except for the last one, the absence of zero
divisors. We leave this for the exercises.

(4) Fix an integer D ≡ 1 (mod 4) that is not a perfect square, and let

R = {(a + b D)/2 | a and b are integers, and either they
are both even or they are both odd}.

We shall abbreviate this condition by saying that a and b have the


same parity. Then R is also an integral domain. This requires more
care.

First, if α = a+b2 D with a and b having the same parity, and β =

c+d D
2 with c and d having the same parity, then
√ √ √
a+b D c+d D (a + b) + (c + d) D
α+β = = =
2 2 2

i i

i i
i i

i i

10 1. Basic Notions

and it is easy to check that a + b and c + d have the same parity, so


α + β is in R. Also,
 a + b√D  c + d√D  (ac + bdD) + (ad + bc)√D
αβ = =
2 2 4
and now it is a little more work to check that, in all cases, since D ≡
1(mod 4), ac + bdD = 2e, with e an integer, and ad + bc =√2f , with f
an integer, and e and f have the same parity, so αβ = e+f2 D is in R.
Again, we leave the proof that R has no zero divisors to the exercises.

(5) Let D be a fixed integer that is not a perfect square, and let

R = {a + b D | a and b are rational numbers}.

Then R is an integral domain. Once again, properties (1)–(9) of an


integral domain are easy to check and we defer the proof of property
(11) to Example 1.8(5).

The integral domains in Example 1.3(3) look pretty natural, but the
integral domains in Example 1.3(4) look rather artificial. It turns out to
be the case that, depending on the value of D, we sometimes want to
consider the former and sometimes the latter. See the exercises for why
this is the case.
We now make a further definition.

Definition 1.4. F is a field if it satisfies properties (1)–(10) and the follow-


ing additional property:

(12) Every nonzero element of F has a multiplicative inverse, i.e., if a is any


nonzero element of F there is an element a−1 with aa−1 = a−1 a = 1.

Lemma 1.5. Every field is an integral domain.

Proof: Suppose F satisfies properties (1)–(10) and (12). We need to show


it satisfies properties (1)–(10) and (11). The fact that it satisfies properties
(1)–(10) is immediate, as that is part of our hypothesis. So we need only
show that it satisfies property (11). That is, we must show that a field has
no zero divisors.
So let a and b be two elements of F with ab = 0. We wish to show that
a = 0 or b = 0. If a = 0, we are done, so suppose a = 0. Then a has an

i i

i i
i i

i i

1.1. Integral Domains 11

inverse a−1 , and we see

ab = 0
−1
a (ab) = a−1 (0)
(a−1 a)b = 0
1b = 0
b = 0,

so b = 0 as required. 

Let us make one more definition.


Definition 1.6. An element a of an integral domain R is a unit if a has a
multiplicative inverse. We let

R∗ = {units of R}.

Remark 1.7. Note that an integral domain R is a field if and only if every
nonzero element of R is a unit.

Example 1.8. Let us consider the integral domains in Example 1.3.


(1) Z is not a field. The only elements of Z that have inverses are 1 (where
1−1 = 1) and −1 (where (−1)−1 = −1).

(2) Q is a field. The inverse of the element a/b is (a/b)−1 = b/a.



(3) For any fixed integer D that is not a perfect square, R = {a + b D |
a and b integers} is not a field.

√ integer D ≡ 1 (mod 4) that is not a perfect square, R =


(4) For fixed
{(a + b D)/2 | a and b integers having the same parity} is not a field.

(5) For any fixed integer D that is not a perfect square, R = {a + b D |
a and b rational numbers} is a field.
To show this, we explicitly
√ find the inverse of any nonzero
√ element
α of R. Let√ α = a +√b D. We define α to be α = a − b D. Then
αα = (a + b D)(a − b D) = a2 − b2 D, so
α a −b √ √
α−1 = = 2 + 2 D = e + f D.
−b D
a2 2 a −b D a −b D
2 2


In particular, α−1 is of the form e + f D where e and f are rational
numbers (to be precise, e = (a/(a2 − b2 D)) and f = (−b/(a2 − b2 D))), so

i i

i i
i i

i i

12 1. Basic Notions

α−1 is an element of R. Thus we see that α indeed has an inverse in R, as


claimed.
For this to make sense we need to know that the denominator is nonzero.
But a2 − b2 D = 0 gives D = (a/b)2 , contradicting our choice of D not a
perfect square.
(Actually, to be totally honest, a perfect square is usually defined to be
the square of an integer, so, with this definition, in order to conclude that
D = (a/b)2 , we need to know that if D is not the square of an integer, it
is not the square of a rational number, and this fact already uses unique
factorization of the integers.)
Also, since R is a field, we conclude from Lemma 1.5 that it is also an
integral domain.
Remark 1.9. Let a and b be elements of an integral domain R with b = 0,
and consider the equation bx = a. This equation may or may not have a
solution, but if it has a solution, that solution x is unique. In this case
we say that b divides a and we write x = a/b. With this definition, the
“usual” rules of fractions hold—see the exercises. (Note that b divides 1 if
and only if b is a unit, and then 1/b = b−1 . In particular, note that in a
field we can divide any element by any nonzero element.)

1.2 Quadratic Fields



We denote the field R of Example 1.3(5) by Q( D), i.e.,
√ √
Q( D) = {a + b D | a and b are rational numbers}.
We have imposed the restriction that D not be a perfect square, but
now we want to impose a further restriction: we want D also to be square-
free, i.e., not divisible by any perfect square, except for 1. This is purely to
avoid duplication. For suppose D were not square-free, i.e., that D = n2 D
for some integers
√ n and √ D . Then we would have (as you can check in the
exercises) Q( D) = Q( D ), and we would just be repeating ourselves.
With this restriction, we let
√ √
O( D) = {a + b D | a and b integers}
if D ≡ 2 or 3 (mod 4), i.e., if D leaves a remainder of 2 or 3 when divided
by 4, (i.e., D = . . ., −10, (not −9), −6, −5, −2, −1, 2, 3, 6, 7, 10, 11, 14,
15, (not 18), 19, . . .), and

√ a+b D
O( D) = { | a and b integers having the same parity}
2

i i

i i
i i

i i

1.2. Quadratic Fields 13

if D ≡ 1 (mod 4), i.e., if D leaves a remainder of 1 when divided by 4 (i.e.,


D = . . ., −31, (not −27), −23, −19, −15, −11, −7, −3, (not 1), 5, (not 9),
13, 17, . . .).
(Note that D cannot be divisible by 4 = 22 , as we are assuming that D
is square-free.)

Definition 1.10. Let D√= 1 be a square-free integer. Q( D)√is called a
quadratic field √ and O( D) is called the ring of integers of Q( D). More
precisely, Q( D) is called a real quadratic field if D > 0 and an imaginary
quadratic field if D < 0.

In computing with quadratic fields, there are some quantities that are
extremely useful.
√ √
Definition 1.11. Let α = a + b D be an element of Q( D). Then its
conjugate α is defined by √
α = a − b D,
its norm N(α) is defined by

N(α) = αα = a2 − b2 D,

and its trace Tr(α) is defined by

Tr(α) = α + α = 2a.

The following properties are crucial.



Lemma 1.12. Let α and β be any two elements of Q( D). Then,
(1) α + β = α + β and αβ = αβ;

(2) Tr(α) = Tr(α) and N(α) = N(α);

(3) Tr(α + β) = Tr(α) + Tr(β) and N(αβ) = N(α) N(β);

(4) If N(α) = 0 then α = 0.

Proof: (1), (2), and (3) are easy to check by direct computation, and we
leave them as exercises.
√ We prove (4).
Let α = a + b D and suppose N(α) = 0. Then

0 = N(α) = a2 − b2 D

so a2 = b2 D and, if b = 0, then (a/b)2 = D. But we assumed D was not


a perfect square, so this is impossible. Hence b = 0 and then a = 0, so
α = 0. 

i i

i i
i i

i i

14 1. Basic Notions

Lemma 1.13. For any element x of O( D), N(x) is an integer.

Proof: If x = a+b D, then N(x) = a2 −b2 D, so if a and b are integers, N(x)
is certainly
√ an integer. Thus the only case we need to check is that of x =
(a + b D)/2 where a and b are both odd and D ≡ 1 (mod 4). In this case
we write a = 2m + 1, b = 2n + 1, D = 4E + 1. Then N(x) = (a2 − b2 D)/4 =
((2m + 1)2 − (2n + 1)2 (4E + 1))/4 = m2 + m − 4n2 E − 4nE − E − n2 − n
is an integer. 

Lemma 1.14. Let R = O( D). Then the units of R are precisely those
elements x of R with | N(x)| = 1.

Proof: First suppose | N(x)| = 1. Then N(x) = ±1. But N(x) = xx. Thus
either xx = 1, in which case x has inverse x−1 = x, or xx = −1, in which
case x has inverse x−1 = −x, so in either case x is a unit.
Conversely, suppose that x is a unit. Then there is an element y of
R with xy = 1. Then on the one hand xy = 1 = 1, and on the other
hand xy = xy, by Lemma 1.12. Thus xy = 1. Multiplying, we see that
xxyy = 1, i.e., that N(x) N(y) = 1. However, by Lemma 1.13, N(x) and
N(y) are both integers. Therefore, we must have either N(x) = N(y) = 1
or N(x) = N(y) = −1. But in either case, we conclude | N(x)| = 1. 

Let us use Lemma 1.14 to try to find the units in O( D).

Corollary 1.15.

(1) The units in O( −1) are {±1, ±i}.
√ √ √
(2) The units in O( −3) are {±1, ±(1 + −3)/2, ±(1 − −3)/2}.

(3) For any other negative value of D, the units in O( D) are {±1}.

Proof: We leave this proof for the exercises. 



Thus, we have completely found the units of O( D) when D is negative.
But when D is positive the situation is much more involved. √
So let D be positive. Then, by Lemma 1.14, x √ = a + b D is a unit
exactly when N(x) = a2 − b2 D = ±1 and x = (a + b D)/2 is a unit exactly
when N(x) = (a2 − b2 D)/4 = ±1, i.e., when a2 − b2 D = ±4, where in
the first case a and b are both integers, and in the second case a and b
are both integers having the same parity. (The first case happens for all
values of D, regardless of D(mod 4), while the second case happens only for
D ≡ 1 (mod 4).) Here it is certainly not a priori clear that | N(x)| = 1 has

i i

i i
i i

i i

1.2. Quadratic Fields 15

a solution, other than x = ±1, and, even if we know there are solutions, it
is completely unclear how to find them.
Nevertheless, let us experiment a bit, by taking small values of D.

Example 1.16.
√ √
(1) Let D = 2. Then N(1 + 2) =√12 − 12 · 2 =√−1 so x = 1 + 2 is a unit
and its inverse is −x = −(1 − 2) = −1 + 2. Since xx−1 = 1, x−k =
(xx−1 )k = k k
√ 1 2 = 1 so√x is also a unit 3for any √ k. So, for example,

x = (1 + 2) = 3 + 2 2 is a unit, as is x = (1 + 2)3 = 7 + 5 2, etc.
2

Note that x > 1 so {1, x, x2 , x3 , . . .} is a steadily increasing sequence of


numbers, so in particular they are all distinct. Moreover, we see that
{. . . , ±x−3 , ±x−2 , ±x−1 , ±1, ±x, ±x2 , ±x3 , . . .} are all distinct as well,
giving an infinite set of distinct units in this case.
√ √
(2) Let D = 3. Then N(2 + 3)√= 22 − 12 · 3 = 1 so x = 2 + 3 is a unit
and its inverse is x √= 2 − 3. Again,
√ xk is a unit√for any k, so √ for
example, x = (2 + 3) = 7 + 4 3 and x3 = (2 + 3)3 = 26 + 15 3
2 2

are units. Again, {. . . , ±x−3 , ±x−2 , ±x−1 , ±1, ±x, ±x2 , ±x3 , . . .} is an
infinite set of distinct units.
√ √
(3) Let D = 5. Then N((1+ 5)/2) = (12 −12 ·5)/4 √ = −1 so x = k(1+ 5)/2
is a unit and its inverse is −x = −(−1 √ − 5)/2. Again, x is√a unit
for any k, so for example, x2 = (3 + 5)/2 and x3 = 2 + 5 are
units, and once again {. . . , ±x−3 , ±x−2 , ±x−1 , ±1, ±x, ±x2 , ±x3 , . . .}
is an infinite set of distinct units.

Remark 1.17. Suppose we have integers a and b satisfying √ the √


equation
a2 − b2 D = 1. We have the identity a2 − b2 D = (a + b D)(a √ − b D)√and
so these values of a and b give a factorization
√ 1 = (a
√ + b D)(a − b√D).
Thus we see that in this case a + b D is a unit of O( D) (as is a − b D).
This equation, a2 − b2 D = 1, is known as Pell’s equation, and has a long
history. A priori it is unclear that Pell’s equation has a solution other than
a = ±1, b = 0 (which just gives the units ±1) but indeed it does. We
devote Chapter 4 to studying Pell’s equation, where we will show that for
any positive integer D that is not a perfect square, Pell’s equation always
has a solution. That is the hard part, but once we have that we will also
show that it always has√infinitely many solutions, and furthermore that the
pattern of units in O( D) is always as in Example 1.16.

i i

i i
i i

i i

16 1. Basic Notions

1.3 Exercises
Exercise 1.1. Let R be an integral domain and let S be a subset of R that
satisfies the following four conditions:

(1) 1 is in S;
(2) if a is in S, then −a is in S;
(3) if a and b are in S, then a + b is in S;
(4) if a and b are in S, then ab is in S.

(a) Show that S is an integral domain.

(b) Give examples to show that if S satisfies any three of these four condi-
tions then S may not be an integral domain. (Thus you will need four
examples, one for each omitted condition.)

Exercise 1.2. Show that the usual rules of signs hold in any integral do-
main R:

(a) −(−a) = a; (d) (−a)b = a(−b) = −(ab);

(b) −(a + b) = (−a) + (−b); (e) (−a)(−b) = ab.

(c) (−1)a = −a;

Exercise 1.3.

(a) Let R be an integral domain and consider the equation bx = a in R.


Show that if this equation has a solution, that solution is unique.

(b) Suppose that b is a unit. Show that for any a, bx = a has the solution
x = ab−1 .

Exercise 1.4. Recall from Remark 1.9 that if a and b are elements of an
integral domain R, we say that b divides a if there is an element x satisfying
the equation bx = a, in which case we write x = a/b. Show that with this
definition, the usual rules of fractions hold:

(a) b(a/b) = a; (d) (ab)/(ac) = b/c;

(b) (a/b)(b/a) = 1; (e) (a/b)(c/d) = (ac)/(bd);

(c) a(b/c) = (ab)/c; (f) (a/c) + (b/c) = (a + b)/c;

i i

i i
i i

i i

1.3. Exercises 17

(g) (a/b) + (c/d) = (ad + bc)/(bd); (h) (a/b) = (c/d) ⇔ ad = bc.

(Note that in some cases the right-hand side of the above equalities may
be defined when the left-hand side is not. We mean these equalities to hold
when both sides are defined.)

Exercise 1.5. Let R be an arbitrary integral domain. Show that if α is a


unit of R then αk is a unit of R for any integer k.

Exercise 1.6. Show that R as in Example 1.3(3) and as in Example 1.3(4)


has no zero divisors (and hence is an integral domain).

Exercise 1.7. If D = n2 D√for some integer


√ (or more generally some rational
number) n, show that Q( D) = Q( D ).

Exercise 1.8. Prove Lemma 1.12(1), (2), and (3).


√ √
Exercise 1.9. Let R = O( √ D) and suppose that α = a + b D is a unit of
R. Show that α = a − b D is also a unit of R.
√ √
Exercise 1.10. Let R = O( D) with D > 0 and suppose that α = a + b D
is a unit of R, α = ±1. Show that {. . . , ±α−3 , ±α−2 , ±α−1 , ±1, ±α, ±α2 ,
±α3 , . . .} are all distinct (and hence that R has infinitely many units).

Exercise 1.11. Prove√Corollary 1.15(3): for D < 0, D = −1, and D = −3,


the only units in O( D) are {±1}.
√ √
Exercise 1.12. Let α = a+b D be an element of Q( D). Show that α is a
root of a monic quadratic polynomial (i.e., a quadratic whose x2 coefficent
is 1). Furthermore, express the coefficients of this quadratic in terms of
N(α) (the norm of α) and Tr(α) (the trace of α).

Exercise 1.13. Recall Definition 1.10:


√ √
O( D) = {a + b D | a and b integers}

if D ≡ 2 or 3 (mod 4), while



√ a+b D
O( D) = { | a and b integers having the same parity}
2
if D ≡ 1 (mod 4). You may wonder why we made a distinction between
these two cases. The answer is that we want the ring of integers to be
naturally defined in terms of some property that it satisfies. Here is the
property:

i i

i i
i i

i i

18 1. Basic Notions
√ √
O( D) is the set of elements of Q( D) that are roots of a
monic quadratic with integer coefficients (i.e., roots of a quadratic
polynomial f (x) = x2 + mx + n with m and n integers).

(a) Verify that this is true for the following elements α of O( D):

(i) α = 3 + 8 6;

(ii) α = 7 − 10 11;

(iii) α = 2 + 9 5;

(iv) α = 4 + 5 −2;

(v) α = −6 + 11 −5;

(vi) α = 32 + 72 −3.

(b) Show that this is the case in general. That is, show that

(i) if D ≡ 2 or 3 (mod 4), then α = c + d D is a root of a monic
quadratic with integer coefficients if and only if c and d are both
integers;

(ii) if D ≡ 1 (mod 4), then α = c+d D is a root of a monic quadratic
with integer coefficients if and only if either c and d are both
integers or c = a/2 and d = b/2 with a and b both odd integers.

In the text of this book, we treat integral domains of the form O( D).
But many of the statements we make have analogs for polynomials, and
we leave the treatment of the polynomial situation to the exercises. Here
is the first case: Let R be an integral domain. Then

R[X] = {polynomials in the variable X with coefficients in R}


= {an X n + an−1 X n−1 + . . . + a1 X + a0 | ai in R for every i}.

(In considering R[X] you may assume the usual properties of polynomial
arithmetic. The cases we will be most concerned with here are R = Q and
R = Z and indeed for the purposes of this book you may confine your
attention to these.)
Exercise 1.14. Show that R[X] is an integral domain.

Exercise 1.15. Show that R[X]∗ = R∗ (i.e., that the units of R[X] are
the constant polynomials {a} for those values of a that are units of R).
In particular, if R is a field, the units of R[X] are the nonzero constant
polynomials.

i i

i i
i i

i i

Chapter 2

Unique Factorization

We now embark on the proof that a number of the integral domains we are
interested in satisfy unique factorization. We have written “proof” rather
than “proofs” as it is our goal to establish a framework that will enable us
to come up with one proof that handles all these cases simultaneously. To
be precise, our strategy will be as follows:

Step 1a. Define “Euclidean domain.”

Step 1b. Prove that certain integral domains are Euclidean domains.

Step 2a. Define “Principal ideal domain.”

Step 2b. Prove that every Euclidean domain is a principal ideal domain.

Step 3a. Define “Unique factorization domain.”

Step 3b. Prove that every principal ideal domain is a unique factorization
domain.

Thus, putting all of these steps together, we see that certain integral
domains are unique factorization domains.
The obvious question now is: “Which ones?”√As we shall see, these
include the integers Z, and the integral domains O( D) for some (definitely
not all!) values of D.
Indeed, the first part of this chapter will be devoted √ to the general
argument we have just described, and to proving that O( D) is a unique
factorization domain in many cases. However, once we accomplish that we
will√turn our attention to the opposite phenomenon, and will prove that
O( D) is not a unique factorization domain in many other cases.

19

i i

i i
i i

i i

20 2. Unique Factorization

We will not be able to settle the issue in all cases, and in fact, in
complete generality the answer is unknown. We will describe our (that is,
mathematicians’) present state of knowledge about this question.

2.1 Euclidean Domains


A Euclidean domain is, roughly speaking, an integral domain in which we
can divide one number by another, obtaining a quotient and a remainder
that is smaller than the divisor. In order to say what smaller is, we must
have a notation of size. We first define this:

Definition 2.1. Let R be an integral domain. Then  ·  is a norm on R if

(1) for every nonzero element a of R, a is a nonnegative integer;

(2) for every two nonzero elements a and b of R,

a ≤ ab.

Remark 2.2.

(1) Note that we do not require 0 to be defined, though it may be.

(2) Note that under this definition it is possible that a = 0 even though
a = 0.

Lemma 2.3. The following are norms:

(1) R = Z and a = |a|.



(2) R = O( D) and α = | N(α)| (where N(α) is defined in Defini-
tion 1.11).

Proof:

(1) We need to check both properties of the norm:

Property 1: Certainly if a is an integer, |a| is a nonnegative integer.


Property 2: For any nonzero integer b, 1 ≤ |b|. Then for any nonzero
integer a, by the properties of the absolute value,

a = |a| ≤ |a| · |b| = |ab| = ab.

i i

i i
i i

i i

2.1. Euclidean Domains 21

(2) This follows from earlier work we have done. Let us see this.
Property (1): We showed in Lemma 1.13 that N(α) is an integer, so
α = | N(α)| is a nonnegative integer.

Property (2): For any β, | N(β)| is a nonnegative integer, and if β = 0,


N(β) = 0. (This is the contrapositive of Lemma 1.12(4).)
Thus for any β = 0, β ≥ 1. For any α, N(αβ) =
N(α) N(β) by Lemma 1.12(3). Thus
α = | N(α)| ≤ | N(α)| · | N(β)| = | N(αβ)| = αβ.


Remark 2.4. Unfortunately, the word “norm” is used to mean two slightly
different things.
√ We called N(α) a norm in Chapter 1. Note that N(α) is
defined on Q( D), may be negative, and need not be an integer. In our
definition here, the norm is required to be a nonnegative
√ integer, and so we
must consider | N(α)|, and only for α in O( D). In this chapter, we will
always use a norm in the sense of Definition 2.1, and we will always denote
such a norm by  · .

Now we can define what we mean by division with a small remainder.

Definition 2.5. An integral domain R with a norm  ·  is a Euclidean


domain if it has the following property: for any element a of R, and any
nonzero element b of R, there is an element q of R (the quotient) and an
element r of R (the remainder) with

a = bq + r and r = 0 or r < b.

Remark 2.6.

(1) This is a familiar property for the integers, which you probably learned
in elementary school: 75 = 17 · 4 + 7, 93 = 11 · 8 + 5, 105 = 23 · 5 + 0.
Nevertheless, it requires proof! We shall prove it momentarily.
(2) Note we are not claiming that the quotient and remainder are unique.
For example, 100 = 3 · 33 + 1 = 3 · 34 + (−2) both work.
(3) Strictly speaking, the definition of a Euclidean domain includes the
integral domain R and the norm  · . We usually say, however, “R is
an integral domain” when the norm is understood.

i i

i i
i i

i i

22 2. Unique Factorization

Lemma 2.7. The integers Z are a Euclidean domain.

Proof: We are claiming that for any integer a, and any integer b = 0, there
is an integer q and an integer r with a = bq + r and r = 0 or r < b.
For each fixed value of b, we prove this claim by complete induction on
a. We shall prove this claim in the case a ≥ 0 and b > 0 here.
So suppose a ≥ 0 and b > 0. Note then that a = |a| = a and
b = |b| = b.
If a = 0, then a = 0, and this claim is certainly true: a = b · 0 + 0 so
q = 0 and r = 0.
Also, if 0 < a < b, then 0 < a < b and this claim is also true:
a = b · 0 + a so q = 0 and r = a satisfies r < b.
Now assume that this claim is true for all integers a with a  < a.
Consider a. We have just proved this claim if a = 0 or 0 < a < b,
so we may restrict our attention to the case that b ≤ a. But in this
case 0 < b ≤ a so 0 ≤ a − b < a. Set a = a − b. Then we can apply the
inductive hypothesis to a to conclude that a = bq  + r for some r with
r = 0 or r  < b. Substituting, we see that a − b = bq  + r and hence
that a = b(q  + 1) + r = bq + r with q = q  + 1 and r = r . But then also
r = 0 or r < b, as required.
Hence our claim is true for a, and so by complete induction we may
conclude that our claim is true for every a ≥ 0.
Thus, we have proved the lemma in this case. We leave the remaining
cases as exercises. 

The point of introducing the notion of a Euclidean domain is that it


applies in many cases other than that of the ordinary integers. In particular,
we have the following theorem.

Theorem 2.8. Let R = O( D). Then for the following values of D, R is
a Euclidean domain:

D = −11, −7, −3, −2, −1, 2, 3, 5, 6, 7, 11, 13, 17, 21, and 29.

Proof: This is a very long proof, so let us begin by describing our strat-

egy. We are trying to investigate an algebraic question—when O( D)
is Euclidean—but we will convert this question to a geometric question.
Then we will solve this question, only using basic analytic geometry. The
geometric idea is simple, but we will work very hard at it and obtain our
results. Thus, this proof is an illustration of the fact that often in mathe-
matics one may start with a simple idea and by pushing it hard enough go
a long way with it.

i i

i i
i i

i i

2.1. Euclidean Domains 23

As we go further, the details—though not the basic idea—get more


complicated. The easiest cases are D = −1, −2, and −3, and the second
easiest cases are D = 2 and 3. We do these cases here. The other cases are
considerably more involved, and we defer them to Appendix C.
Let α and β be elements of R with β = 0. We wish to show that we may
always write α = βγ +ρ where γ is some element of R and ρ is an element
√ of
R with ρ√= 0 or ρ < β. To be concrete, let us write α√= a + b D√and
β = c+d D. Then we may form the quotient α/β = (a+b √ D)/(c+d D).
In general this√quotient will not be an element of O( D) but will be an
element of Q( D). In fact, as we saw in Chapter 1,
√ √ √
a+b D (a + b D) · (c − d D)
√ = √ √
c+d D (c + d D) · (c − d D)
√  c −d √ 
= (a + b D) 2 + 2 D
c − d D c − d2 D
2

ac − bdD −ad + bc √ √
= 2 + 2 D = (e + f D),
c − d2 D c − d2 D

where e = (ac − bdD)/(c



2
− d2 D) and f = (−ad
√ + bc)/(c − d D). √
2 2
If it
happens that e+f D is an element of R = O( D), then set γ = e+f D.
Then α = βγ with γ in R, so β divides α and α = βγ + 0 so we simply set
ρ = 0, and we are done.
Usually we will not be so lucky, however, so we turn to geometry in
order to proceed further. We observe √ that e and f are certainly rational
numbers. We now represent γ0 √ = e + f D with e and f rational numbers,
or equivalently with γ0 in Q( D), by the point (e, f ) in the plane. If
D = −1 this is just the usual representation of the complex number e+f i as
the point√ (e, f ) in the complex plane. If D = −1 it is a new representation
of e+f D, but an equally √valid one. Note that the points in the plane that
represent elements of Q( D) are precisely the points (e, f ) where both e
and f are rational numbers. We will call these points Q-points.
Along with this new representation of points we introduce a new metric
(i.e., measure of distance) in the plane. We let (e, f )D = |e2 − f 2 D| and
this measures the distance√from the point (e, f ) to the origin. Note that
(e, f ) corresponds to e + f D. More √ generally, if (s, t) is another point of
the plane, corresponding to s + t √D, then (e,√ f ) − (s, t) = (e − s, f √
− t)
corresponds to the difference (e + f D) − (s + t D) = (e − s) + (f − t) D,
and then (e, f ) − (s, t)D = (e − s, f − t)D is the distance between these
two points. (Remember that we are restricting ourselves √ to Q-points as
those are the points that correspond to elements of Q( D).)

i i

i i
i i

i i

24 2. Unique Factorization

Let us make some important observations


√ about√  · D . First of all, if
(e, f ) corresponds
√ to an element e+f D of R = O(
√ D), then (e, f )|D =
e + f D. Thus, in our identification of Q( D) with Q-points in the
plane,  · D agrees √ with our norm  ·  on O-points, i.e., α0 D = α0 
for any α0 in O( D). Also, we know √ that α1 α2  = α1  · α2  for
any two elements α1 and α2√of O( D). This is still true if α1 and α2
are any two elements of Q( D). For we see, from Definition √ 1.11 and
Lemma 1.12, that, for any two elements α1 and α2 of Q( D), α1 α2  =
| N(α1 α2 )| = | N(α1 ) N(α2 )| = | N(α1 )| · | N(α2 )| = α1  · α2 . But  · D is
not a norm on Q-points in the sense of Definition 2.1, as for a general Q-
point (e, f ), (e, f )|D need not be an integer. For example, (1/2, 0)D =
1/4. Similarly, it is√not always the case that α2 | ≤ α1 α2 D for α1 and
α2 elements of Q( D), as, for example, (1/2)α2 D = (1/4)α2 D for
any α2 .

Now let√us return to our problem.
√ We have α/β = e + f D. Write √
γ0 = e + f D, with √ γ0 in Q( D). Our objective is to find γ = s + t D
with γ in R = O( D) and γ0 − γD < 1.
Let us assume for the moment that we have succeeded in doing so. Then

α = βγ0 = βγ + β(γ0 − γ) = βγ + ρ

where we set ρ = β(γ0 − γ). Then ρD = β(γ0 − γ)D = β · γ0 − γD
and, since γ0 − γD < 1, we have ρD < βD , so we have found values
for γ and ρ that satisfy the conditions of a Euclidean domain. (We are
assuming that γ is an element of R, and then we see that ρ = α − βγ is
also an element of R, and furthermore we see that (and this is the crucial
point!) ρD < βD .)
Hence√we have reduced
√ our problem to √ the problem
√ of showing that √ for
any e +√ f D in Q( D), there is an s + t D in O( D) with (e + f D) −
(s + t D)D < 1. Translating this into our geometric language, we need
to show that for any point (e, f ) in the plane with rational coordinates,
there is a point (s, t) in the plane corresponding to an element of R, with
(e, f ) − (s, t)D < 1.
We will√need to know which points of the plane correspond to elements
of R = O( D), so let us determine √ that now. The answer depends on D.
If D ≡ 2 or 3 (mod 4), then s + t D is in R when both s and t are integers,
so the points in the plane corresponding to elements of R are the √ points
(s, t) with both coordinates integers. If D ≡ 1 (mod 4), then s + t D is in
R when both s and t are integers or when both s and t are half-integers, so
the points in the plane corresponding to elements of R are the points (s, t)

i i

i i
i i

i i

2.1. Euclidean Domains 25

±2 ±1 1 2

±1

±2

Figure 2.1. The case D ≡ 2 or 3 (mod 4).

with both coordinates integers or both coordinates half-integers. In√any


event, we will refer to the points corresponding to elements of R = O( D)
as O-points.
First, let us consider the case D ≡ 2 or 3 (mod 4). Here the O-points
are points with integer coordinates. Let us divide the plane into regions
consisting of points that are nearest each of these points in the usual metric
on the plane, not in the new metric  · D . We shall call these points
apparently nearest, since they “look” nearest when we look at the plane.
The points apparently nearest the point (s, t) are the points in a square of
side 1 centered at (s, t), as in Figure 2.1.
Next, we consider the case D ≡ 1 (mod 4). Here the O-points are points
with both coordinates integers or half-integers, and the points apparently
nearest the point (s, t) are the points in a diamond with diagonals of length
1 centered at (s, t), as in Figure 2.2.
These regions of apparently nearest points cover the plane, so for any
Q-point γ0 = (e, f ) there is an O-point γ1 = (s, t) to which it is apparently
nearest. (Usually there will be exactly one such point, but if γ0 is on the
border of one of these squares or diamonds there will be more than one
such point. In that case, choose γ1 to be any of them—it doesn’t matter
which we choose.)
Now the distance of γ0 from γ1 , γ0 − γ1 D , is the same as the distance
(γ0 − γ1 ) − 0D from γ0 − γ1 to the origin. (Note we are now using our

i i

i i
i i

i i

26 2. Unique Factorization

±2 ±1 1 2

±1

±2

Figure 2.2. The case D ≡ 1 (mod 4).

new metric, the one in which we are really interested.) A little thought
shows that we have simply translated (i.e., shifted) the problem to points
apparently nearest the origin, and we can always translate the problem in
this way. Thus, if we can show every Q-point apparently nearest the origin
is within a distance of 1 from some O-point (s, t), this will be true of all
Q-points in the plane, and again we will be done.
Thus, we have reduced our problem to considering the Q-points that
are apparently nearest the origin. We shall denote this region by 0 . Now
the real work begins.

0.5

±1 ±0.5 0 0.5 1

±0.5

±1

Figure 2.3. The case D = −1.

i i

i i
i i

i i

2.1. Euclidean Domains 27

0.6

0.4

0.2

±1 ±0.5 0 0.5 1
±0.2

±0.4

±0.6

Figure 2.4. The case D = −2.

Now we bring in the analytic geometry. First, we consider the case


when D < 0, and ask for what points (x, y) we have (x, y)D < 1. Now
(x, y)D = x2 − Dy 2 = x2 + (−D)y 2 and since D < 0, −D > 0. Then
(x, y)D = 1 is the equation x2 + (−D)y 2 = 1, which we recognize as an
ellipse centered at the origin with semi-major
√ axis of length 1 along the
x-axis, and semi-minor axis of length 1/ −D along the y-axis. (Actually,
there is one exception. When D = −1, the semi-minor axis also has length
1, and the curve is a circle.) Thus, the points with (x, y)D < 1 are the
points that are strictly inside (that is, inside and not on) this ellipse (or
circle, when D = −1). But now, for D = −1, −2, or −3, every point
apparently nearest the origin is in this region, as we see from Figures 2.3,
2.4, and 2.5.

0.6

0.4

0.2

±1 ±0.5 0 0.5 1
±0.2

±0.4

Figure 2.5. The case D = −3.

i i

i i
i i

i i

28 2. Unique Factorization

Tracing the argument back, we√see that for √ the Q-point γ0 = (e, f ),
corresponding to the element e + f D of√Q( D), if γ1 = √ (s, t) is the O-
point, corresponding to the element s + t D of R = O( D), apparently
nearest to γ0 , then choosing γ = γ1 we have found an O-point γ with
γ0 − γD < 1, as required, in the cases D = −1, −2, or −3, completing
the proof in these cases.
Now we consider the case D > 0, and again ask for what points (x, y) we
have (x, y)D < 1. Here (x, y)D = x2 − Dy 2 . We recognize |x2 − Dy 2 | =
1 as the equation of two pairs of hyperbolas. The equation x2 − Dy 2 = 1
gives a pair of hyperbolas, one opening to the right and one opening to the
left, having vertices 1 unit to the right and 1 unit to the left of the origin,
respectively, and the equation x2 − Dy 2 = −1 gives a pair√of hyperbolas,
one opening
√ up and one opening down, having vertices 1/ D units above
and 1/ D units below the origin, respectively. (We shall say these two
pairs of hyperbolas are centered
√ at the origin and have semi-major axis
1 and semi-minor axis 1/ D, although those terms are usually just used
for ellipses.) Now the points (x, y) with (x, y)D < 1 are the points with
|x2 − Dy 2 | < 1, i.e., with −1 < |x2 − Dy 2 | < 1, so they are the points
“inside” of these hyperbolas. That is, they are the points in the region
bounded by all four of these curves, consisting of a rectangular area in the
center adjoined by four tails that go apparently infinitely far out toward
the northeast, northwest, southeast, and southwest. (Actually, the exact
direction they go out
√ depends on D. √ To be precise, they go out around the
asymptotes y = x/ D and y = −x/ D.) But for D = 2 or 3, every point

1.5

0.5

±2 ±1 1 2

±0.5

±1

±1.5

Figure 2.6. The case D = 2.

i i

i i
i i

i i

2.1. Euclidean Domains 29

0.5

±2 ±1 0 1 2

±0.5

±1

Figure 2.7. The case D = 3.

apparently nearest the origin is in this region, as we see from Figures 2.6
and 2.7.
Again, tracing the argument back, we see √ that for√the Q-point γ0 =
(e, f ), corresponding to the element e + f D of Q( √ D), if γ1 = √ (s, t)
is the O-point, corresponding to the element s + t D of R = O( D),
apparently nearest to γ0 , then choosing γ = γ1 we have found an O-point
γ with γ0 − γD < 1, as required, in the cases D = 2 or 3, completing the
proof in these cases as well.
This concludes the argument in the (relatively) easy cases. As we men-
tioned above, the other cases are handled in Appendix C, to which we refer
the interested reader. 

We have just proved that, for some values of D, O( D) is a Euclidean

domain. In fact, the only negative values of D for which O( D) is a
Euclidean domain are the ones we have given, and we shall prove that now.
Before we do, we will remark that our list √ for positive D is not complete.
Also, it is much harder to prove that O( D) is not a Euclidean domain for
a positive value of D. The problem here is that the “tails” of the hyperbolas
apparently
√ go infinitely far out and so it is possible that some point γ of
O( D) apparently very far away from γ0 will really be within a distance
of 1 from it (or perhaps even closer). We saw some examples of this in the
part of the proof of Theorem 2.8 that appears in Appendix C, but with
more work can come up with very dramatic
√ examples. For example, taking
D = 41, we have that √ γ = 46 − 7 41, which is apparently very far away
from γ0 = (23/125) 41, is actually very close to γ0 . Calculation shows
γ0 − γ41 = 1/250 = 0.004!

i i

i i
i i

i i

30 2. Unique Factorization

Lemma 2.9. If D < 0 and D = −1, −2, −3, −7, or −11, then R = O( D)
is not a Euclidean domain with respect to its norm  · .

Proof: We shall continue to use the language and notation of the proof of
Theorem 2.8. √
To show that R = O( D) is not a Euclidean domain with respect to
its norm, we need only find a point γ0 of 0 that is not within a distance
of 1, in the norm  · D , of any point γ of R, i.e., which is not in the interior
of an ellipse centered at any γ.
First, suppose D ≡ 2 or 3 (mod 4). We are excluding D = −1 or −2,
so √we have D ≤ −5, i.e., |D| ≥ 5. Now each √ ellipse has semiminor axis
1/ −D < 1/2 centered at a point γ = s + t D where both √ s and t, and
in particular t, are integers. Thus, in order for γ0 = e + f D to be in such

an ellipse, we must have f within a distance
√ of 1/ −D of some integer.
But f = 1/2 is a distance of 1/2√> 1/ −D from the nearest integer, and
hence any integer, so γ0 = (1/2) −D is a suitable point.
Now suppose D ≡ 1 (mod 4). We are excluding D = −3, −7, or −11,
so we have D ≤ −15, i.e., |D| ≥ 15. Suppose in fact that D = −15, so
|D| ≥ 17. The argument here √ is very similar to the previous case. Each √
ellipse has semiminor axis 1/ −D < 1/4 centered at a point γ = s + t D
where both s and t, and in√particular t, are integers or half-integers. Thus,
in order for γ0 = e + √ f D to be in such an ellipse, we must have f
within a distance
√ of 1/ −D from the nearest integer or half-integer, so
γ0 = (1/4) D is a suitable point.
Thus, we have proved the lemma for every value of D except for D =

−15, and our proof does not work in that case, for the point γ0 = (1/4) −15
is indeed within a distance of 1 from γ = 0, as (0, 1/4)−15 = 15/16 <
1. But here we make a different choice of γ0 . Here we choose γ0 =

(3/11) −15. If γ = 0, then γ0 − γ−15 = γ0 −15 = (0, 3/11)−15 =

135/121 > 1; if γ = (±1/2) + (1/2) −15 then γ0 − γ−15 =
(±1/2, −5/22)−15 = 496/484 > 1; and for any other value of γ we see
that γ0 − γ−15 is even larger (as the difference of the y-coordinates is
larger), so γ0 is a suitable point. 

Remark 2.10.

(1) Actually, the complete list of positive values of D for which R = O( D)
is a Euclidean domain with respect to its norm  ·  is known. We state
this result without proof. These values of D are D = 2, 3, 5, 6, 7, 11,
13, 17, 19, 21, 29, 33, 37, 41, 57, and 73.

i i

i i
i i

i i

2.2. The GCD-L Property and Euclid’s Algorithm 31

(2) To √
be precise, what we showed is that for the values of D in Lemma 2.9,
O( D) is not a Euclidean domain with respect to the norm √  ·  that
we have defined. This leaves open the possibility that O( D) is a
Euclidean domain with respect to some different norm. We shall not
investigate this question.

Remark 2.11. We are left with a final question: where does the name “Eu-
clidean” come from? The answer is that in a Euclidean domain, we may
perform Euclid’s algorithm. We shall save this for the next section, when
will learn not only how to do it, but also what it is good for.

We close this section by recording a lemma that will enable us to easily


tell when an element of a Euclidean domain is a unit. (Recall from Defini-
tion 1.6 that an element β of R is a unit if there is an element β  of R with
ββ  = 1.) This will provide a generalization of Lemma 1.14. (Actually,
using this lemma would enable us to simplify some of our earlier proofs
slightly, but in the interest of directness we did not do so.)

Lemma 2.12. Let R be a Euclidean domain and let β be any nonzero ele-
ment of R. Then β ≥ 1 and β = 1 if and only if β is a unit.

Proof: For the first inequality, set a = 1 and b = β in Definition 2.11(2) to


conclude that, for any element β,
1 ≤ 1 · β = β.

Now suppose β is a unit, and let ββ  = 1. Then, setting a = β and b = β 


in Definition 2.1, β ≤ ββ   = 1
so combining these two inequalities shows β = 1.
On the other hand, suppose β = 1. By Definition 2.5, we can find
elements q and r of R with
1 = βq + r and r = 0 or r < β.

But by assumption, β = 1, and by what we have just proved, there
are no nonzero elements r of R with r < 1, so we must have r = 0 and
then 1 = ββ  with β  = q, so β is a unit in R. 

2.2 The GCD-L Property and Euclid’s Algorithm


Let us begin by recalling a definition that may be familiar to you in the
case of positive integers: Let a and b be positive integers. Their greatest

i i

i i
i i

i i

32 2. Unique Factorization

common divisor g = gcd(a, b) is the unique positive integer with the prop-
erty that (1) g divides both a and b; and (2) if d is any integer dividing
both a and b, then d divides g.
We should point out that a priori the gcd may not exist. We are claiming
that there is one and only one positive integer with a certain property, and
a priori there may be no such integer or more than one such integer.
But for the positive integers the gcd does indeed exist and can be found
by taking the common prime factors of a and b. For example, if a = 360 =
23 · 32 · 5 and b = 2268 = 22 · 34 · 7, then g = 22 · 32 = 36. If a = 37 = 37 and
b = 143 = 11 · 13, then g = 1 (as they have no prime factors in common).
If a = 280067 = 229 · 1223 and b = 227168 = 25 · 31 · 229, then g = 229.
This is not really a satisfactory answer, however, because this assumes
unique factorization, which we have not shown yet. In fact, we will use the
gcd to prove unique factorization, not the other way around. (It is also not
really satisfactory from a practical viewpoint either, since this method of
finding the gcd requires us to factor a and b into a product of primes, and
this is not so easy, unless a and b are small.) Moreover, we will see that we
have a gcd in any Euclidean domain. Thus, since we have already shown

that O( −1) is Euclidean, we can consider elements of that ring.
For example, if a = 23 − i and b = 24 + bi, then g = −1 + i. This
comes from the prime factorization 23 − i = (−1 + i)(2 − i)(7 + 2i) and
24+6i = −(1+i)(−1+i)(3)(4+i), and as difficult as it may be to find prime

factorizations in Z, it is more difficult to find them in O( −1). (Actually, I
have exaggerated here to make a point. We will develop a lot more theory,

which will tell us how to do prime factorization in O( −1), and we will
see that it is not too much more difficult than in Z.)
What we shall use is not only the property that elements α and β of
R have a gcd, but in addition, that the gcd can be written as a linear
combination of α and β. That is, if γ is the gcd of α and β, then there are
elements δ and ε of R with γ = αδ + βε. For example,

36 = 360 · 19 + 2268 · (−3),


1 = 37 · 58 + 143 · (−15),
229 = 280067 · (−73) + 227168 · 90,
−1 + i = (23 − i)(−6 + 5i) + (24 + 6i)(4 − 6i).

Even given the prime factorizations of the numbers involved, it is no sim-


ple task to find a linear combination that yields their gcd, as the above
examples show.

i i

i i
i i

i i

2.2. The GCD-L Property and Euclid’s Algorithm 33

We will develop an algorithm, known as Euclid’s algorithm, and we will


see that Euclid’s algorithm provides an effective way to find the gcd of α
and β (without having to factor them first), and as a byproduct yields an
expression γ = αδ + βε of their gcd γ as a linear combination of α and β.
For example, α = 1123456789 and β = 876543210 have gcd γ = 1, and
furthermore
1 = 1123456789(356396689) + 876543210(−456790122).

(I have no idea what the prime factorizations of 1123456789 and 876543210


are.)
But, far more important than the practical aspect, we will be able to
prove using Euclid’s algorithm that, in this situation, it is always possible
to express γ as γ = αδ + βε, and this theoretical result will be the key to
proving unique factorization.
With these examples in mind, we get to work.

Definition 2.13. Let R be an integral domain and let {αi } be a set of ele-
ments of R, not all of which are zero. Then an element γ of R is a greatest
common divisor (gcd) of {αi }, γ = gcd({αi }) if

(1) γ divides each αi ,

(2) if ζ is any element of R that divides each αi , then ζ divides γ.

In general, a gcd may or may not exist. We shall soon explore the
question of when it does. But for the moment, let us assume that a gcd
does exist and explore the consequences of that assumption.

Lemma 2.14.

(1) Let γ be a gcd of {αi } and let ε be any unit of R. Then γ  = γε is also
a gcd of {αi }.

(2) If γ and γ  are any two gcd’s of {αi }, then γ  = γε for some
unit ε.

Proof:

(1) By the definition of a unit, there is an element ε of R with εε = 1.


Then γ = γ1 = γ(εε ) = (γε)ε = γ  ε . Thus, γ divides γ  and also
γ  divides γ. With that in mind, let us check that γ  satisfies both
properties of a gcd.

i i

i i
i i

i i

34 2. Unique Factorization

Property (1): Since γ  divides γ and γ divides each αi , γ  divides


each αi .

Property (2): Since ζ divides γ and γ divides γ  , ζ divides γ  .

(2) By the definition of a gcd, γ divides γ  , so γ  = γε for some ε, and,


again by the definition of a gcd, γ  divides γ, so γ = γ  ε for some ε .
Then γ = γ  ε = γεε so 1 = εε and hence ε is a unit. 

Still assuming that a gcd exists, we have the following important


definition.
Definition 2.15. If {αi } has a gcd of 1, then {αi } is relatively prime.

The next lemma shows we can “factor out” a gcd.


Lemma 2.16. Let {αi } have a gcd of γ, and for each i, write αi = γαi .
Then {αi } is relatively prime.

Proof: We must show that 1 has the two properties of a gcd of {αi }. Now
1 has property (1) of a gcd of {αi } as 1 certainly divides each αi .
Suppose now that ζ is any element of R that divides each αi . Then γζ
divides each γαi . But γαi = αi so γζ divides each αi . By property (2) of
a gcd of {αi }, we have that γζ divides γ, and hence ζ divides 1. Thus we
see that 1 also has property (2) of a gcd of {αi }, so we conclude that 1 is
a gcd of {αi }. 

It is sometimes convenient to have a stronger condition than that in


Definition 2.15. For further reference, we define that now.
Definition 2.17. If {αi } is a set of elements of R such that any two distinct
elements of this set have a gcd of 1, then {αi } is pairwise relatively prime.

To see the distinction between these two definitions, let us consider the
set of integers {6, 10, 15}. This set is relatively prime, as it has a gcd of
1, but is not pairwise relatively prime. Looking at pairs of elements, we
see that 6 and 10 have a gcd of 2, that 6 and 15 have a gcd of 3, and that
10 and 15 have gcd of 5. Thus in this set, no two distinct elements are
relatively prime.
We have been proceeding a bit hypothetically, assuming a gcd exists
and exploring the consequences of that assumption. Now let us turn to the
question of when a gcd actually does exist.
We shall formulate a stronger property than the mere existence of the
gcd, and investigate that.

i i

i i
i i

i i

2.2. The GCD-L Property and Euclid’s Algorithm 35

Definition 2.18. An integral domain R has the GCD-L property if the fol-
lowing is true:

(1) Every set of elements A = {αi } in R, not all zero, has a gcd γ, and

(2) it is possible to write the gcd γ as a linear combination of the elements


of A, i.e., there are elements {βi } of R such that

γ= αi βi .

This is a very important property, as we shall see, but GCD-L is not


standard mathematical language.
Remark 2.19. In this definition, A = {αi } may be infinite. In that case, we
(implicitly) require that only finitely many of {βi } be nonzero, as otherwise
we would have an infinite sum, which would not make sense.

Here is our main theorem. First, we will give a very short and slick (but
nonconstructive) proof of this theorem. Then we will give a constructive
proof that will lead us to Euclid’s algorithm.
Theorem 2.20. Every Euclidean domain R has the GCD-L property.

First Proof: Let R be a Euclidean domain with norm ·, and let A = {αi }
be any set of elements of R not all of which are zero.
Let S be the set of all linear combination of the elements of A,
 
S= αi βi | each βi is in R, and only finitely many βi = 0 .

Observe that S contains each element of A, as for any value of i we


may write αi = αi · 1. (That is, we write αi as a linear combination of the
elements of A by choosing βi = 1 and βi = 0 for i = i.) In particular,
since not all of the {αi } are zero, S contains at least one nonzero element.
Let
S  = {nonzero elements of S}.
Now let T  be the set of norms of the element of S  ,

T  = {α | α is in S  }.

The set T  is a nonempty set of nonnegative integers, so by the Well-


Ordering Principle has a smallest element t. Let γ be an element of S  with

γ = t. By the definition of S  , γ is a linear combination γ = αi βi0
for some elements {βi }. We claim that γ is a gcd of {αi }. To see this, we
0

must verify the properties of the gcd.

i i

i i
i i

i i

36 2. Unique Factorization

Property (1): γ divides each αi . Actually, we shall prove that γ divides


each element of S. Since, as we have observed, each αi is
in S, that shows what we need. To see this, let α be an

arbitrary element of S. Then, by definition, α = αi βi for
some {βi }. Since R is a Euclidean domain, we know we can
write
α = γδ + ρ
for some δ in R and some ρ in R with ρ = 0 or ρ < γ.
Substituting, we see
  
αi βi = αi βi0 δ + ρ,

and solving for ρ we find



ρ= αi (βi − βi0 δ),

which we recognize as a linear combination of {αi }, i.e., as


an element of S. On the one hand, by our choice of ρ, we
have that ρ is an element of S with ρ = 0 or ρ < γ.
On the other hand, by the definition of γ, γ = t is the
smallest norm of any nonzero element of S, so there are no
elements ρ of S with ρ < γ. Hence, the only possibility
for ρ is ρ = 0. But, substituting, this gives α = γδ, and so γ
divides α.

Property (2): If ζ divides each αi , then ζ divides γ. To see this, observe


that since ζ divides each αi , we may write each αi = ζθi for
some element θi of R.
We know

γ= αi βi0 ,

so, substituting, we find


  
γ= ζθi βi0 = ζ θi βi0 ,

and hence we see that ζ divides γ. 

We shall build up to our second proof gradually.

i i

i i
i i

i i

2.2. The GCD-L Property and Euclid’s Algorithm 37

Definition 2.21. For A = {αi } a set of elements of an integral domain R,


not all of which are zero, let

D(A) = {ζ in R | ζ divides each αi in A}.

Remark 2.22. We observe that D(A) is a nonempty set, as it contains the


identity element of 1 of R. (Certainly 1 divides every αi .) Referring to
Definition 2.13, we also observe that A has a gcd if and only if there is
some element γ of D({αi }) divisible by every element of D({αi }), in which
case γ is the gcd.

Our next lemma shows that we may modify a set of elements A =


{α1 , α2 } in a controlled way without changing D(A).

Lemma 2.23. Let α1 and α2 be any two elements of R, and let δ be any
element of R. Set α2 = α2 + δα1 . Then D({α1 , α2 }) = D({α1 , α2 }).

Proof: Let us set D = D({α1 , α2 }) and D = D({α1 , α2 }). We want to


show that these two sets are the same, and we show this by showing that
every element of D is an element of D and vice-versa.
First, suppose that β is in D = D({α1 , α2 }). Then β divides α1 , so
α1 = βε1 , and β divides α2 , so α2 = βε2 , for some elements ε1 and of R.
But then α2 = α2 + δα1 = βε2 + δβε1 = β(ε2 + δε1 ), so β divides α2 . Hence
β is in D({α1 , α2 }) = D .
The argument in the other direction uses the identical logic. Suppose
that β  is in D = D({α1 , α2 }). Then β  divides α1 , so α1 = β  ε1 , and
β  divides α2 , so α2 = β  ε2 , for some elements ε1 and of R. But then
α2 = α2 − δα1 = β  ε2 − δβ  ε1 = β  (ε2 − δε1 ), so β  divides α2 . Hence β  is
in D({α1 , α2 }) = D. 

Here is another lemma about changing sets in a (different) controlled


way. Note that at this point we do not want to assume that every set of
elements of R, not all of which are zero, has a gcd. (To be sure, we proved
that in Theorem 2.20, but we are building up to a second, independent
proof of that theorem here.) So we include the assumption that the sets A
and {α , γ} each have a gcd as part of our hypothesis.

Lemma 2.24. Let A = {αi } be any set of elements of R, not all of which
are zero, and suppose that A has a gcd γ. Let α be any element of R. If
{α , γ} has a gcd γ  , then γ  is also a gcd of the set A = {α } ∪ A. (In
particular, the set A has a gcd.)

i i

i i
i i

i i

38 2. Unique Factorization

Proof: We shall show that D({α , γ}) = D({α } ∪ A). In light of Re-
mark 2.11, this proves the lemma.
Again, we show that these two sets are equal by showing that every
element of one of them is also an element of the other.
Suppose δ is in D({α , γ}). Then δ divides α and δ divides γ. Since γ
divides each αi , we see that δ divides each αi , so δ is in D({α } ∪ A).
On the other hand, suppose δ  is in D({α } ∪ A). Then δ  divides α ,
and δ  divides each αi . But by the definition of the gcd of A, δ  divides γ.
Hence δ is in D({α , γ}). 

With these results in hand, we can now give our second proof of Theo-
rem 2.20. This proof only applies, however, to the case that A = {αi } is a
finite set.
Second Proof of Theorem 2.20: Suppose that A = {αi } = {α1 , . . . , αn } is
a finite set consisting of n elements. We prove the theorem by induction
on n.
First, suppose n = 1. Then the gcd of {α1 } is clearly γ = α1 . (α1
divides α1 and any β that divides α1 divides α1 .) So {α1 } has a gcd, and
furthermore α1 = α1 · 1 so we see that both conditions in Definition 2.18
are satisfied.
Next, suppose n = 2. This is the crucial case. To handle this case we
employ a procedure known as Euclid’s algorithm. Consider {α1 , α2 }. If
α2 = 0 then (as every element of R divides 0), the gcd of {α1 , α2 } is the
gcd of {α1 }, which we have just observed is α1 . Also, α1 = α1 · 1 + α2 · 0.
So in this case we are done. Similarly, if α1 = 0 then by the same logic the
gcd of {α1 , α2 } is α2 , and α2 = α1 · 0 + α2 · 1, and we are again done. Now
suppose α1 and α2 are both nonzero.
To avoid notational confusion, we shall set θ1 = α1 and θ2 = α2 . We
may use the division algorithm in the Euclidean domain R to write

θ 1 = θ 2 δ 1 + θ3 with θ3 = 0 or θ3  < θ2 .

If θ3 = 0, we stop. Otherwise we continue the process, and write

θ 2 = θ 3 δ 2 + θ4 with θ4 = 0 or θ4  < θ3 .

If θ4 = 0, we stop. Otherwise we continue the process, and write

θ 3 = θ 4 δ 3 + θ5 with θ5 = 0 or θ5  < θ4 .

Keep going.

i i

i i
i i

i i

2.2. The GCD-L Property and Euclid’s Algorithm 39

We claim this process eventually stops. If it never did, we would have an


infinite sequence θ2 , θ3 , θ4 , θ5 , . . ., with θ2  > θ3  > θ4  > θ5  > · · · .
But each of θi  is a nonnegative integer, and it is impossible to have
an infinite sequence of strictly decreasing nonnegative integers. (This is
a consequence of the Well-Ordering Principle.) So it stops at some stage.
Let that be stage k.
Thus, we see we have a sequence of divisions:
θ1 = θ 2 δ 1 + θ3 θ3  < θ2 ,
θ2 = θ 3 δ 2 + θ4 θ4  < θ3 ,
..
.
θk−3 = θk−2 δk−3 + θk−1 θk−1  < θk−2 ,
θk−2 = θk−1 δk−2 + θk θk  < θk−1 ,
θk−1 = θk δk−1 .
We claim γ = θk is the gcd of θ1 = α1 and θ2 = α2 . To prove this we
use Lemma 2.23 repeatedly:
θk = gcd(θk , 0)
= gcd(θk , 0 + θk δk−1 ) = gcd(θk , θk−1 ) = gcd(θk−1 , θk )
= gcd(θk−1 , θk + θk−1 δk−2 ) = gcd(θk−1 , θk−2 ) = gcd(θk−2 , θk−1 )
= gcd(θk−2 , θk−1 + θk−2 δk−3 ) = gcd(θk−2 , θk−3 ) = gcd(θk−3 , θk−2 )
= ··· ··· = gcd(θ2 , θ3 )
= gcd(θ2 , θ3 + θ2 δ1 ) = gcd(θ2 , θ1 ) = gcd(θ1 , θ2 ),
as required.
This shows the first condition in Definition 2.18. Now we need to show
the second condition, that γ can be written as a linear combination of α1
and α2 . In practice, as we will see from the examples we do after finishing
the proof, we work up from the bottom. But it is easier to prove this instead
by induction. In fact, we claim, and we shall prove by complete induction,
that it is possible to write each θi , and hence in particular γ = θk , as a
linear combination of θ1 = α1 and θ2 = α2 .
This claim is certainly true for θ1 , as θ1 = θ1 · 1 + θ2 · 0, and also for θ2 ,
as θ2 = θ1 · 0 + θ2 · 1.
So suppose i ≥ 3 and the claim is true for θj for all j < i. Then in
particular
θi−2 = θ1 ε1 + θ2 ε2 ,
θi−1 = θ1 ζ1 + θ2 ζ2
for some ε1 , ε2 , ζ1 , and ζ2 . We also know that
θi−2 = θi−1 δi−2 + θi ,

i i

i i
i i

i i

40 2. Unique Factorization

so, solving for θi and substituting, we have

θi = θi−2 − θi − 1δi−2
= (θ1 ε1 + θ2 ε2 ) − (θ1 ζ1 + θ2 ζ2 )δi−2
= θ1 (ε1 − ζ1 δi−2 ) + θ2 (ε2 − ζ2 δi−2 ),

and thus we have that θi is written as a linear combination of θ1 and θ2 ,


completing the inductive step, and hence proving the claim.
This concludes the proof of the n = 2 case.
Now for the inductive step. Suppose the theorem is true for {α1 , . . . , αn−1 }
and consider {α1 , . . . , αn }. We have already considered the cases n = 1 and
n = 2, so we may suppose n ≥ 3. There are two easy cases: if αn = 0,
then the situation for {α1 , . . . , αn } reduces immediately to the situation
for {α1 , . . . , αn−1 }, and if α1 = . . . = αn−1 = 0, the gcd is αn . So sup-
pose that neither of these is the case. Then, by the inductive hypothesis,
{α1 , . . . , αn−1 } has a gcd γ and then by the n = 2 case {αn , γ} has a gcd
γ  . But by Lemma 2.24, γ  is then the gcd of {α1 , . . . , αn }. This verifies
the first condition of Definition 2.18.
Now for the second condition. By the inductive hypothesis we may
assume that we have written

γ = α1 β1 + α2 β2 + . . . + αn−1 βn−1

for some elements β1 , β2 , . . ., βn−1 of R, and by the n = 2 case we may


assume that we have written

γ  = αn ζ1 + γζ2 .

Substituting, we see that

γ  = αn ζ1 + (α1 β1 + α2 β2 + . . . + αn−1 βn−1 )ζ2


= α1 (β1 ζ2 ) + α2 (β2 ζ2 ) + . . . + αn−1 (βn−1 ζ2 ) + αn ζ1

and γ  is a linear combination of the elements of {α1 , . . . , αn }.


Thus, the truth of the theorem for n − 1 implies its truth for n, and by
induction we are done. 
We now use Euclid’s algorithm, as given in our second proof of The-
orem 2.20, to find the gcd of elements of a Euclidean domain R, and to
express the gcd as a linear combination of those elements.

i i

i i
i i

i i

2.2. The GCD-L Property and Euclid’s Algorithm 41

Example 2.25. We begin by considering the case R = Z.


(1) Let α1 = 2268 and α2 = 360. Then
2268 = 360 · 6 + 108,
360 = 108 · 3 + 36,
108 = 36 · 3,

so we see that gcd(2268, 360) = 36. We find a linear combination that


works by solving for the gcd in the next-to-the-last equation and working
our way up, substituting at each step:

36 = 360 + (108)(−3)
= 360 + (2268 + 360(−6))(−3)
= 2268(−3) + 360(19).

(2) Let α1 = 2268, α2 = 360, and α3 = 552. We find the gcd of α1 , α2 ,


and α3 by first finding the gcd γ  of α1 and α2 , and then finding the
gcd of γ  and α3 . We have just done the first step. Here is the second:

552 = 36 · 15 + 12,
36 = 12 · 3,

so the gcd is 12, and furthermore, also using our work above,

12 = 552 + 36(−15)
= 552 + (2268(−3) + 360(19))(−5)
= 2268(45) + 360(−285) + 552(1).

(3) Let α1 = 143 and α2 = 37. Then

143 = 37 · 3 + 32,
37 = 32 · 1 + 5,
32 = 5 · 6 + 2,
5 = 2 · 2 + 1,
2 = 1 · 2,

so the gcd is 1, and then

1 = 5 + 2(−2)
= 5 + (32 + 5(−6))(−2)
= 32(−2) + 5(13)
= 32(−2) + (37 + 32(−1))(13)
= 37(13) + 32(−15)
= 37(13) + (143 + 37(−3))(−15)
= 143(−15) + 37(58).

i i

i i
i i

i i

42 2. Unique Factorization

In this example, we have followed the usual practice of using positive


remainders. But there is another practice, namely, using remainders whose
absolute value is as small as possible. Let us redo this example that way:

143 = 37 · 4 + (−5),
37 = −5(−7) + 2,
−5 = 2(−3) + 1,
2 = 1 · 2,

so again we find the gcd is 1 (of course), and then

1 = −5 + 2(3)
= −5 + (37 + (−5)7)(3)
= 37(3) + (−5)22
= 37(3) + (143 + 37(−4))22
= 143(22) + 37(−85).

(4) Let α1 = 227168 and α2 = 280067. Then

280067 = 227168 · 1 + 52899,


227168 = 52899 · 4 + 15572,
52899 = 15572 · 3 + 6183,
15572 = 6183 · 2 + 3206,
6183 = 3206 · 1 + 2977,
3206 = 2977 · 1 + 229,
2977 = 229 · 13,

so the gcd is 229, and then

229 = 3206 + 2977(−1)


= 3206 + (6183 + 3206(−1))(−1)
= 6183(−1) + 3206(2)
= 6183(−1) + (15572 + 6183(−2))(2)
= 15572(2) + 6183(−5)
= 15572(2) + (52899 + 15572(−3))(−5)
= 52899(−5) + 15572(17)
= 52899(−5) + (227168 + 52899(−4))(17)
= 227168(17) + 52899(−73)
= 227168(17) + (280067 + 227168(−1))(−73)
= 280067(−73) + 227168(90).

(Note, as a practical matter, that in this example we easily found that


gcd(227168, 280067) = 229, whereas it would have been quite difficult to
factor 227168 and 280067 into primes.)

i i

i i
i i

i i

2.2. The GCD-L Property and Euclid’s Algorithm 43

(5) Let α1 = 1123456789 and α2 = 876543210. Then


1123456789 = 876543210 · 1 + 246913579,
876543210 = 246913578 · 3 + 135802473,
246913579 = 135802473 · 1 + 111111106,
135802473 = 111111106 · 3 + 24691367,
111111106 = 24691367 · 4 + 12345638,
24691367 = 12345638 · 2 + 91,
12345638 = 91 · 135666 + 32,
91 = 32 · 2 + 27,
32 = 27 · 1 + 5,
27 = 5 · 5 + 2,
5 = 2 · 2 + 1,
2 = 1 · 2,
so the gcd is 1, and then
1 = 5 + 2(−2)
= 5 + (27 + 5(−5))(−2)
= 27(−2) + 5(11)
= 27(−2) + (32 + 27(−1))(11)
= 32(11) + 27(−13)
= 32(11) + (91 + 32(−2))(−13)
= 91(−13) + 32(27)
= 91(−13) + (12345638 + 91(−135666))(37)

= 12345638(37) + 91(−5019655)
= 12345638(37) + (24691367 + 12345638(−2))(−5019655)
= 24691367(−5019655) + 12345638(10039347).
= 24691367(−5019655) + (111111106 + 24691367(−4))(10039347)
= 111111106(10039347) + 24691367(−45177043)
= 111111106(10039347) + (135802473 + 111111106(−1))(−45177043)
= 135802473(−45177043) + 111111106(55216390)
= 135802473(−45177043) + (246913579 + 135802473(−1))(55216390)
= 246913579(55216390) + 135802473(−100393433)
= 246913579(55216390) + (876543210 + 246913579(−3))(−100393433)
= 876543210(−100393433) + 246913579(356396689)
= 876543210(−100393433) + (1123456789 + 876543210(−1))(356396689)
= 1123456789(356396689) + 876543210(−456790122).

(Note, as a practical matter, that this computation can easily be per-


formed with only the use of an arbitrary precision calculator. It is also
clear that this method can be implemented on a computer to find quickly
and easily the gcd of huge numbers and to represent the gcd as a linear
combination of those numbers.)

i i

i i
i i

i i

44 2. Unique Factorization

Example 2.26.


(1) Now we do an example with R = O( −1). In finding the quotient
(and remainder) at each √ step, we√follow the strategy
√ of the proof of
Theorem 2.8: if (a + b √ D)/(c + d D) = e + f D with e, f in Q, we
let the quotient be s + t D where s and t are integers closest to e and
f , respectively (and then the remainder is forced).
Let α1 = 24 + 6i and α2 = 23 − i. Then
24 + 6i = (23 − i)(1) + (1 + 7i)
(as (25 + 6i)/(23 − i) = (546 + 162i)/530),
23 − i = (1 + 7i)(−3i) + (2 + 2i)
(as (23 − i)/(1 + 7i) = (16 − 162i)/50),
1 + 7i = (2 + 2i)(2 + i) + (−1 + i)
(as (1 + 7i)/(2 + 2i) = (16 + 5i)/8),
2 + 2i = (−1 + i)(−2i),
so the gcd is −1 + i, and then
−1 + i = (1 + 7i) + (2 + 2i)(−2 − i)
= (1 + 7i) + ((23 − i) + (1 + 7i)(3i))(−2 − i)
= (23 − i)(−2 − i) + (1 + 7i)(4 − 6i)
= (23 − i)(−2 − i) + ((24 + 6i) + (23 − i)(−1))(4 − 6i)
= (24 + 6i)(4 − 6i) + (23 − i)(−6 + 5i).
(By way of further explanation, (25 + 6i)/(23 − i) = (546 + 162i)/530 =
(546/530) + (162/530)i, which is nearest to 1; (23 − i)/(1 + 7i) = (16 −
162i)/50 = (16/50)+(−162/50)i, which is nearest to −3i; and (1+7i)/(2+
2i) = (16 + 5i)/8 = 2 + (5/8)i, which is nearest to 2 + i.)

(2) In our next example, R = O( −7). The logic of this example depends
on the proof of Theorem 2.8 in the case D = −7. Since we deferred the
proof of that case of Theorem 2.8 to Appendix C, we similarly defer
this example to Appendix C.
Remark 2.27. Note that the gcd is only defined up to multiplication by a
unit in R (i.e., if γ is a gcd of αi and α2 , so is γ for any unit of R—compare
Lemma 2.14), so it would be, strictly speaking, better to speak of “a” gcd
rather than “the” gcd. In particular, if R = Z, with units {±1}, then, for
example, −36 is also a gcd of 2268 and 360, and −36 = 2258(3)+360(−19).

If R = O( −1), the units are {±1, ±i}, so 1 − i = −(−1 + i), −1 − i =
i(−1 + i), and 1 + i = −i(−1 − i) are also gcd’s of 24 + 6i and 23 − i, and,
for example, 1 + i = (24 + 6i)(−6 − 4i) + (23 − i)(5 + 6i).

i i

i i
i i

i i

2.3. Ideals and Principal Ideal Domains 45

2.3 Ideals and Principal Ideal Domains


We are interested in investigating questions of factoring elements in an
integral domain R. With a lot of mathematical hindsight, it turns out that
instead of just looking at individual elements α of R, we should also look
at subsets of R consisting of all multiples of α. If we consider multiples of
α, we observe that

(1) the sum of any two multiples of α is a multiple of α,

(2) any multiple of a multiple of α is a multiple of α.

Again, with a lot of mathematical hindsight, it is precisely these two


properties that we wish to consider in general. So we are led to the defini-
tion of an ideal.
Definition 2.28. An ideal I in an integral domain R is a nonempty subset
of R with the following properties:
(1) if α1 and α2 are in I, then α1 + α2 is in I,

(2) if α1 is in I and β is any element of R, then α1 β is in I.

In other words, an ideal I is a nonempty subset of R that is closed


under addition and also is closed under multiplication by any element of R
(not just by elements of I).
Our first examples of ideals consist precisely of the multiples of some
element of R.
Definition 2.29. Let α be an element of R. The principal ideal generated
by α is
Iα = {α | α = αβ for some β in R}.

Let us check that principal ideals are indeed ideals.


Lemma 2.30. Let α be an element of R. Then Iα is an ideal.

Proof: We need to check that Iα satisfies the two properties of an ideal.

Property (1): Let α1 and α2 be in Iα . Then α1 = αβ1 and α2 = αβ2


for some elements β1 and β2 of R. Then α1 + α2 = αβ2 =
α(β1 + β2 ), so α1 + α2 is in I.

Property (2): Let α1 be in Iα , and let β be any element of R. Then


α1 = αβ1 for some element β1 of R. Then α1 β = (αβ1 )β =
α(β1 β), so α1 β is in I. 

i i

i i
i i

i i

46 2. Unique Factorization

Here are two extreme cases.


Example 2.31.

(1) I = R is an ideal. (In this case, I is called an improper ideal. Otherwise,


I is called a proper ideal.) Note that R is a principal ideal as R = I1 .
(Every element of R is a multiple of 1.)

(2) I = {0}. (In this case, I is called the zero ideal. Otherwise, I is called
a nonzero ideal.) Note that {0} is a principal ideal as {0} = I0 . (The
only multiple of 0 is 0.)

Let us return to Definition 2.29. On the one hand, we see that the ideal
generated by α is simply the set of multiples of α. On the other hand, we
see that the set of multiples of any element α of R is an ideal, and in fact a
principal ideal. So you may ask: are all ideals of R of this form? Excellent
question! But we shall defer the answer to this question for a while. Right
now, we continue with a general development of properties of ideals.
Lemma 2.32. Let I be an ideal of R. The following are equivalent:

(1) 1 is in I.

(2) I = R.

Proof:

(1) implies (2): suppose 1 is in I. Then for any element β of R, by part


(2) of Definition 2.28, 1β = β is in I.

(2) implies (1): if I = R, then certainly 1 is in I. 

Corollary 2.33. Let R be an integral domain. Then R is a field if and only


if the only ideals in R are I = {0} and I = R.

Proof: First suppose R is a field, and let I be an ideal in R. If I = {0},


then I contains a nonzero element α of R. But any α = 0 is a unit in R, so
there is an element β of R with αβ = 1. Then 1 is in I so, by Lemma 2.32,
I = R.
Conversely, suppose the only ideals in R are {0} and R. Let α be any
nonzero element of R. Then Iα is a nonzero ideal (as it contains α), so
Iα = R, and so 1 is in Iα . But Iα is the set of multiples of α, so 1 = αβ
for some element β of R, i.e., α is a unit. Thus, every nonzero element of
R is a unit, so, by Remark 1.7, R is a field. 

i i

i i
i i

i i

2.3. Ideals and Principal Ideal Domains 47

In Definition 2.29, we considered multiples of a single element. Now a


multiple of an element is the same as a linear combination of that element.
So in the following generalization of Definition 2.29, it is natural to consider
linear combinations.
Definition 2.34. Let A = {αi } be a nonempty set of elements of R.
The ideal generated by A is

IA = { linear combinations of elements of A}



={ αi βi | each βi is in R, and only finitely many βi = 0}.

Let us see that IA is always an ideal, and in fact that we get every ideal
this way.
Lemma 2.35.

(1) For any nonempty subset A, IA is an ideal.

(2) Every ideal I is IA for some A.

Proof:

(1) We have to verify that IA satisfies both properties of an ideal. First,


IA is closed under addition: Let α1 and α2 be any two elements of
 
IA . Then we may write α1 = αi βi1 and α2 = αi βi2 . But then
1 2
 1 2 1 2
α + α = αi (βi + βi ), and so we see that α + α is in IA . Second,
IA is closed under multiplication by any element of R: Let α be any
element of IA and let β be any element of R. Then we may write
 
α = αi βi . But then αβ = αi (βi β) is in IA .

(2) By the properties of an ideal, I = II . (That is, we may choose A = I


itself.) 

In case A = α (i.e., if this set consists of the single element α), then
IA = Iα is nothing other than the principal ideal Iα that we have already
considered. Suppose instead, for example, that A = {α1 , α2 }. Then we
have the ideal Iα1 ,α2 , and this is not of the form Iα , so is not a priori a
principal ideal. But it may turn out that in fact Iα1 ,α2 = Iα0 for some α0 ,
i.e., that it is indeed a principal ideal. In fact, it may turn out that this
is always the case. This is a very important situation to which we give a
name.
Definition 2.36. An integral domain R is a principal ideal domain (PID)
if every ideal I in R is principal.

i i

i i
i i

i i

48 2. Unique Factorization

Recall that we have earlier defined the GCD-L property (Definition 2.18).
Proposition 2.37. Let R be an integral domain. Then R is a PID if and
only if R has the GCD-L property.

Proof: Suppose that R is a PID. Let A = {αi } be a set of elements of R,


not all zero. Consider the ideal IA . Since R is a PID, this is a principal
ideal, so IA = Iγ for some element γ of R. We claim that γ is the gcd of
A, and furthermore that γ can be written as a linear combination of the
elements of A, and this is precisely what we need to prove in order to show
that R has the GCD-L property.
We verify condition (2) of Definition 2.18 first: note that γ is in Iγ = IA ,

so by the definition of IA (Definition 2.34), γ = αi βi for some {βi }, i.e.,
γ is a linear combination of the elements of A.
Now for condition (1) of Definition 2.18: note that each αi is in IA = Iγ ,
so αi = γβi0 for some βi0 (as every element of Iγ is a multiple of γ), i.e., γ
divides each αi . Suppose that ζ is any element of R that divides each αi ,
  
and write αi = ζ i for each i. Then γ = αi βi = (ζ i )βi = ζ( i βi )
so ζ divides γ. Hence, by Definition 2.13, γ is a gcd of {αi }.
Conversely, suppose that R has the GCD-L property. Let I be any ideal.
Then I is a set of elements of R, so by the GCD-L property (Definition 2.18)
there is an element γ of R such that (1) γ is the gcd of the elements of
I, and (2) γ is a linear combination of the elements of I. We claim that
I = Iγ .
First, according to (1), γ divides every element of I, i.e., every element
of I is a multiple of γ. Since, by definition, Iγ consists of all the multiples
of γ, we see that I ⊆ Iγ .
Second, (2) states that γ is a linear combination of the elements of I,
and then every multiple of γ is also a linear combination of the elements of
I, so is also in I. Again, since Iγ consists of all the multiples of γ, we see
that Iγ ⊆ I.
Thus, each of I and Iγ is a subset of the other, so these two sets must
be equal.
Thus, every ideal I of R is of the form I = Iγ for some element γ of R,
i.e., every ideal I of R is principal, and thus (Definition 2.36) R is a PID.

Corollary 2.38. Every Euclidean domain R is a PID.

Proof: This follows directly from Theorem 2.20 and Proposition 2.37. 

Let us observe that the proof of Proposition 2.37 actually showed a


more precise result than we stated. Not only did it show that every ideal

i i

i i
i i

i i

2.3. Ideals and Principal Ideal Domains 49

in a Euclidean ring is always a principal ideal; it in fact identified that


ideal. So we actually have the following result:

Corollary 2.39. Let R be a PID.

(1) Let A be any set of elements of R, not all zero. Let γ = gcd(A). Then
IA = Iγ .

(2) Let I be any nonzero ideal of R, and let γ = gcd(I). Then I = Iγ .

Proof: (1) is the claim in the first paragraph of the proof of Proposi-
tion 2.37, and (2) is the claim in the fourth paragraph of the proof of
Proposition 2.37. 

Remark 2.40. One can ask whether, conversely, every PID is a Euclidean
domain. The answer is no, but the examples are not particularly illumi-
nating (or useful), so we shall not give any.

We have just shown that any set {αi } of elements of a PID R has a
gcd γ. Recall Definition 2.15: if {αi } has a gcd of 1, then {αi } is relatively
prime.
The following is a classical and extremely useful result.

Lemma 2.41 (Euclid’s Lemma). Let R be a PID and let α be any nonzero
element of R. Let β and γ be elements of R and suppose that α divides βγ.
If α and β are relatively prime, then α divides γ.

First Proof: Since α and β are relatively prime we can write

1 = αζ + βθ

for some elements ζ and θ of R. Multiplying by γ, we find

γ = αγζ + βγθ.

Now α divides βγ, so βγ = αδ for some δ. Substituting, we find

γ = αγζ + αδθ = α(γδ + δθ)

so α divides γ. 

Second Proof: Consider the two elements αγ and βγ of R. By hypothesis,


these two elements have a gcd. Call it δ.

i i

i i
i i

i i

50 2. Unique Factorization

Clearly γ divides both αγ and βγ. Then, by the definition of a gcd, γ


divides δ. Write δ = ζγ. Now δ divides βγ, i.e., ζγ divides βγ, so ζ divides
β. Also, δ divides αγ, i.e., ζγ divides αγ, so ζ divides α. Thus, we see that
ζ divides both α and β. But α and β are assumed to be relatively prime,
i.e., to have a gcd of 1. Then, by the definition of a gcd, ζ divides 1, i.e., ζ
is a unit. Write 1 = ζ  ζ.
Finally, α clearly divides αγ, and by hypothesis α divides βγ, so, again
by the definition of a gcd, α divides δ = ζγ. But then α also divides
ζ  δ = ζ  (ζγ) = (ζ  γ)γ = 1γ = γ, as claimed. 
Here are two important applications of Euclid’s Lemma.
Corollary 2.42. Let R be a PID and let α and β be relatively prime nonzero
elements of R. Let γ be an element of R and suppose that α divides γ and
β divides γ. Then αβ divides γ.

Proof: Since α divides γ, we may write γ = αδ for some element δ of R.


Then β divides αδ, and β and α are relatively prime, so by Euclid’s Lemma,
β divides δ. Write δ = βζ for some element ζ of R. Then

γ = αδ = α(βζ) = (αβ)ζ,

so αβ divides γ. 

Corollary 2.43. Let R be a PID and let α, β, and γ be elements of R. Sup-


pose that α and β are relatively prime, and also that α and γ are relatively
prime. Then α and βγ are relatively prime.

Proof: Let δ be a gcd of α and βγ. Then δ divides α, and so gcd(δ, β)


divides gcd(α, β). But α and β are assumed to be relatively prime, so δ
and β are also relatively prime.
Now δ divides βγ, and δ and β are relatively prime, so by Euclid’s
Lemma, we conclude that δ divides γ. Thus, δ is a common divisor of α
and γ. But α and γ were assumed to be relatively prime, so δ is a unit.
Thus, we see that gcd(α, βγ) is a unit, i.e., that α and βγ are relatively
prime. 

Remark 2.44. We presented two proofs of Euclid’s Lemma (Lemma 2.41).


The first proof was Euclid’s original proof and was simpler than the second
proof. Thus, the question arises as to why we bothered to provide the
second proof. Here is the answer.
If you look at the proofs carefully, you will notice that the first proof
required us to use the GCD-L property, Definition 2.18. This definition

i i

i i
i i

i i

2.4. Unique Factorization Domains 51

had two conditions, (1) and (2), and we used both of them. The second
proof, on the other hand, only used condition (1) of Definition 2.18 (that
the relevant elements of R had a gcd), but did not use condition (2) of
Definition 2.18 (that the gcd could be expressed as a linear combination of
those elements). Hence the second proof works more generally. Thus, we
see that

Euclid’s Lemma (Lemma 2.41) and its consequences (Corol-


lary 2.42 and Corollary 2.43) are true in any integral domain R
in which any two elements (not both zero) have a gcd,

whether or not that gcd can be expressed as a linear combination of those


elements.

2.4 Unique Factorization Domains


In this section we reach our first main goal, of showing that in certain
integral domains we do have unique factorization.
We must begin, however, by carefully defining our terms.

Definition 2.45. Let R be an integral domain.

(1) An element of α of R is a unit if α divides 1 (i.e., if there is an element


α of R with αα = 1).

(2) Two elements α1 and α2 of R are associates if α2 = α1 β where β is a


unit of R.

(3) An element α of R is irreducible if α is not a unit and if α = βγ implies


that β is a unit (and hence that α and γ are associates) or that γ is a
unit (and hence that α and β are associates).

(4) An element α of R is prime if α divides a product βγ implies that α


divides β or α divides γ.

Remark 2.46. You may have been a bit surprised by parts (3) and (4) of
this definition. The usual definition of a positive integer a being prime is
that a = 1 and if a = bc for some positive integers b and c, then b = 1 or
c = 1. The generalization of that to an integral domain R is in part (3) of
Definition 2.45, but we are not calling this generalization prime. Rather,
we are calling it irreducible, and using the term prime to denote something
else, defined in part (4). Although surprising, this turns out to be the right

i i

i i
i i

i i

52 2. Unique Factorization

thing to do. As we will show (see Lemma 2.48), in the integers, or in any
integral domain with unique factorization, these two notions turn out to
be equivalent, so it does not matter whether we use notion (3) or (4) in
that case, but in general (4) is the right notion to use.
We will observe that (3) may be more practical to check. In the case of
the positive integers, to check (3) we only have to try divisors of a, and we
know any such divisor must be less than or equal to a, so there are only
finitely many numbers to check.
On the other hand, to check (4), we must look at any number d and
see if d is divisible by a. If so, we must look at any factorization d = bc
of d, and see if a divides one of the factors b or c, and here there are
infinitely many numbers to check. This difference, as well as millennia of
mathematical tradition, are the reasons the usual definition is preferred for
the positive integers.

Actually, a prime element is always irreducible, although in general, an


irreducible element may not be prime. Let us see the first of these claims
now. (We defer examples of the second.)
Lemma 2.47. Let R be an integral domain. If an element α of R is prime,
then α is irreducible.

Proof: Suppose α is prime. Let α = βγ. Then α certainly divides βγ,


so α divides β or α divides γ. Suppose α divides β, i.e., β = αδ. Then
α = βγ = (αδ)γ = α(δγ), so 1 = δγ, and hence γ divides 1, so γ is a unit.
Similarly, if α divides γ then β is a unit. Thus α is irreducible. 

Lemma 2.48. Let R be a PID. Then an element α of R is prime if and


only if α is irreducible.

Proof: In Lemma 2.47 we showed that, in any integral domain, a prime


is irreducible. Thus, we must show that here (in the case of a PID), an
irreducible is prime.
Thus, let α be an irreducible element of R and suppose α divides a
product β1 β2 . We want to show that α divides one of the factors. Let γ be
the gcd of α and β1 . (Since R is a PID, we know that gcd(α, β1 ) exists.)
Then γ certainly divides α, i.e, α = γδ. But α is irreducible, so that means
γ or δ is a unit. Suppose δ is a unit. Then α and γ are associates. Now γ
certainly divides β1 , so α divides β1 as well. On the other hand, suppose
γ is a unit. Then α and β1 are relatively prime. But then we can apply
Euclid’s Lemma (Lemma 2.41, also true because R is a PID) to conclude
that α divides β2 . Hence α is prime. 

i i

i i
i i

i i

2.4. Unique Factorization Domains 53

Definition 2.49. An integral domain R is a unique factorization domain


(UFD) if every nonzero element α of R can be written essentially uniquely
as α = up1 · · · pk with u a unit and each pi irreducible, i.e.,

(1) every nonzero α can be written as α = up1 · · · pk with u a unit and


each pi irreducible, and

(2) if also α = vq1 · · · q with v a unit and q1 · · · q irreducible, then = k


and, after possibly reordering, qi and pi are associates for each i.

Remark 2.50. Observe that essential uniqueness is the best we can hope
for. For example, in the integers, we have 6 = (1)(2)(3) = (−1)(−2)(3) =
(−1)(2)(−3) = (1)(−2)(−3) = (1)(3)(2) = (−1)(−3)(2) = (−1)(3)(−2) =
(1)(−3)(−2).

We need the following technical lemma in our proof of Theorem 2.52.

Lemma 2.51. Let α1 , α2 , α3 , . . ., be a sequence of nonzero elements in a


PID R such that αi is divisible by αi+1 for each i. Then there is some
integer k such that αk , αk+1 , . . ., are all associates.

Proof: Let A = {α1 , α2 , α3 , . . .}. Since R is a PID, it has the GCD-L


property by Proposition 2.37, and so A has a gcd γ and we can write γ as
a finite sum of multiples of elements of A. If k is the highest index that
appears in this sum, we claim that the terms from αk on are all associates.
k
Let us write γ = i=1 αi βi for some elements βi of R. Now by assump-
tion α2 divides α1 and α3 divides α2 , so α3 divides α1 as well. Continuing
in this fashion, we see that αk divides αi for each i = 1, . . . , k. Thus, αk
divides each term in the above sum for γ, so we see αk divides γ as well.
Now consider any term α with ≥ k. Then, just as above, α divides
αk . We have seen that αk divides γ. But γ is the gcd of A, so it divides
each αi , and in particular it divides α . Thus αk and α divide each other,
so are associates. 

Here is a very important result.

Theorem 2.52. Every PID R is a UFD.

Proof: We must show that any nonzero element of α of R has an essen-


tially unique factorization. We show this in two stages: first, that α has a
factorization, and second, that this factorization is essentially unique.
In the first stage we claim that either α is a unit or α is divisible by
some irreducible element of R.

i i

i i
i i

i i

54 2. Unique Factorization

We prove this claim: If α is a unit, we are done, so suppose α is not a


unit. If α is irreducible, we are done. If not, then α = α1 β1 , where neither
α1 nor β1 are units. If α1 is irreducible, we are done. If not, α1 = α2 β2
where neither α2 nor β2 are units, and so α = α1 β1 = α2 β2 β1 . If α2 is
irreducible we are done. If not, α2 = α3 β3 where neither α3 nor β3 are
units, and then α = α2 β2 β1 = α3 β3 β2 β1 . Continue this process. If it
eventually stops, at step k, say, then α = αk βk . . . β2 β1 , so α is divisi-
ble by the irreducible element αk . Thus to complete the proof, we need
only show the process eventually stops. Suppose not. Then we get a se-
quence α1 , α2 , α3 , . . . with each αi divisible by αi+1 . Then we can apply
Lemma 2.51 to conclude that from some point αk on, the elements are all
associates. In particular, αk and αk+1 are associates, so αk = αk+1 βk+1
with βk+1 a unit. But in our process we assumed that neither αk+1 nor
βk+1 were units. Thus, we have a contradiction if the sequence goes on
forever. This is impossible, and so we conclude that the sequence stops.
With this claim in hand, we can complete the proof of the first stage
by a very analogous argument.
We know that α is divisible by an irreducible element α1 . Write α =
α1 β1 . If β1 is a unit, we are done, so suppose not. Then β1 is divisible
by an irreducible element α2 . Write β1 = α2 β2 so α = α1 β1 = α1 α2 β2 .
If β2 is a unit we are done. If not, β2 = α3 β3 with α3 irreducible and
α = α1 α2 β2 = α1 α2 α3 β3 . Continue this process. If it eventually stops, at
step k, say, then α = α1 α2 · · · αk βk with α1 , α2 , . . . , αk irreducibles and βk
a unit, and we have our desired factorization. Thus to complete the proof,
we need only show that the process eventually stops. Suppose not. Then
we get a sequence β1 , β2 , β3 with each βi divisible by βi+1 . Then we can
apply the preceding lemma to conclude that from some point βk on, the
elements are all associates. In particular, βk = βk+1 αk+1 with αk+1 a unit.
But in our process, we assumed that αk+1 is irreducible. Thus, we have
a contradiction if the sequence goes on forever. This is impossible, and so
we conclude that the sequence stops.
We therefore see that we have completed the proof of the first stage:
every α has a factorization. Now we must prove the second stage: this
factorization is essentially unique.
Suppose there is an α with two factorizations

α = up1 · · · pk = vq1 · · · q

with u and v units and p1 , . . . , pk and q1 , . . . , q all irreducible. We now


crucially use Euclid’s Lemma (Lemma 2.41), or rather, its consequence

i i

i i
i i

i i

2.4. Unique Factorization Domains 55

(Lemma 2.48), which tells us that in the PID R, every irreducible element
is prime.
Consider p1 . It is a prime and divides the product vq1 · · · q = (vq1 )
(q2 · · · q ) and hence divides one of the factors vq1 or (q2 · · · q ). If it divides
vq1 , and hence q1 (as v is a unit), fine.
Otherwise, it divides q2 · · · q = q2 (q3 · · · q ). If it divides q2 , fine. Other-
wise it divides q3 · · · q = q3 (q4 · · · q ). Continue in this fashion to conclude
that p1 divides some qi . Reordering the qi ’s, if necessary, we may assume
that p1 divides q1 . But q1 is irreducible, so p1 and q1 must be associates,
so q1 = u p1 for some unit u . Then
α = up1 p2 · · · pk = vq1 q2 · · · q = vu p1 q2 · · · q ,
so, setting v  = vu ,
α = up2 · · · pk = v  q2 · · · q .
Now apply the same argument to p2 and α to conclude that p2 divides,
and hence is an associate of, some one of q2 , . . . , q , which by renumbering
we may assume is q2 , and then
α = up3 · · · pk = v  q3 · · · q .
Continuing in this way, we see that, until the process stops, and possibly
after reordering the qi ’s, p1 is an associate of q1 , p2 is an associate of q2 ,
. . . . We claim that when the process stops, we have used all the pi ’s and
all the qi ’s, so k = and we are just left with a unit on each side, proving
the theorem.
Otherwise, either we have used all the pi ’s but not all the qi ’s, so > k,
and we are left with
u = wqk+1 · · · q
for some unit w, which is impossible, as the left-hand side u is a unit but
the right-hand side wqk+1 · · · q is not, or vice versa, so k > and we are
left with
up+1 · · · pk = w,
which is similarly impossible. 

We now assemble several of our previous results.

Corollary 2.53. The following integral domains are unique factorization


domains:

(1) Z, the integers,

i i

i i
i i

i i

56 2. Unique Factorization
√ √
(2) O( D), the ring of integers in the quadratic field Q( D), for D =
−11, −7, −3, −2, −1, 2, 3, 5, 6, 7, 11, 13, 17, 21, and 29.

Proof: Each of these integral domains is a Euclidean domain (Lemma 2.7,


and Theorem 2.8); every Euclidean domain is a PID (Corollary 2.39); and
every PID is a UFD (Theorem 2.52). 

Remark 2.54. It is natural to ask whether the converse of Theorem 2.52 is


true: is every UFD a PID? The answer to this question is no, for general
integral domains
√ R. But it turns out to be the case that the answer is yes
for R = O( D), i.e., that the ring of integers of a quadratic field is a UFD
if and only if it is a PID.

In the remainder of this section we will investigate UFDs more deeply.


As we saw in Lemma 2.48, in a PID, primes and irreducibles are the
same thing. This is true in the more general situation of a UFD as well.

Lemma 2.55. Let R be a UFD. Then an element α of R is prime if and


only if α is irreducible.

Proof: In Lemma 2.47 we showed that, in general, a prime is irreducible.


Thus, we must show that here (in the case of a UFD) an irreducible is
prime. Let α be irreducible and let β and γ be elements of R with α dividing
βγ. Then β and γ have factorizations into irreducibles β = up1 · · · pk and
γ = vp1 · · · p , so βγ has the factorization βγ = (uv)p1 · · · pk p1 · · · p into
irreducibles. Since α divides βγ, βγ = αδ for some δ, and then δ has a
factorization into irreducibles δ = wp1 · · · pm . Then
(uv)p1 · · · pk p1 · · · p = βγ = αδ = wαp1 · · · pm .
But factorization into irreducibles is essentially unique, so α is an associate
of some pi , in which case α divides β, or α is an associate of some pi , in
which case α divides γ. Hence α is a prime. 

This lemma justifies the following definition.

Definition 2.56. Let α = up1 · · · pk be as in Definition 2.49. Then this is


called the prime factorization of α and p1 , . . . , pk are the prime divisors
(or prime factors) of α.

Remark 2.57. Let α = up1 · · · pk be as in Definition 2.56. Then we may


e
gather associated prime divisors of α together and write α = vpe11 · · · pj j for
some positive integers e1 , . . . , ej , with each pi a prime and no two distinct
pi ’s associated, and with v a unit.

i i

i i
i i

i i

2.4. Unique Factorization Domains 57

Recall that we introduced the greatest common divisor (gcd) in our


discussion of PIDs (Definition 2.13). Now we will reconsider this concept
in the more general situation of UFDs. Of course, since every PID is a
UFD, our discussion here will apply to PIDs as well.
The key to our discussion is the following lemma, which is very useful
in its own right.

Lemma 2.58. Let R be a UFD and let α and β be nonzero elements of R.


Then α divides β if and only if

(1) every prime p that divides α also divides β, and

(2) for every such prime p, if pe is the highest power of p dividing α, and
if pf is the highest power of p dividing β, then f ≥ e.

Proof: First, let us suppose that conditions (1) and (2) are satisfied. Then,
for some primes p1 , . . . , pk , q1 , . . . , q and some exponents, and some units
u and v, we have
α = upe11 pe22 · · · pekk ,
β = vpf11 pf22 · · · pfkk q1g1 q2g2 · · · qg .

Then, setting
γ = wpf11 −e1 pf22 −e2 · · · pfkk −ek q1g1 q2g2 · · · qg
with w = uv −1 , we have
β = αγ,
so in this situation α divides β.
On the other hand, suppose α divides β, so β = αγ.
Factor α and γ into primes, where we allow the exponents d1 , . . . , dk to
be zero:

α = upe11 · · · pekk ,
γ = wpd11 · · · pdkk q1g1 · · · qg .

Then, setting v = uw,

β = vpf11 · · · pfkk q1g1 · · · qg

with fi = ei + di ≥ ei for each i = 1, · · · , k, and so we see that conditions


(1) and (2) are satisfied. 

i i

i i
i i

i i

58 2. Unique Factorization

Proposition 2.59. Let R be a UFD and let α and β be nonzero elements of


R with prime factorizations
α = upe11 · · · pekk q1g1 · · · qg ,
β = vpf11 · · · pfkk r1h1 · · · rm
hm
.

Then α and β have a gcd δ, and, moreover,


δ = pd11 · · · pdkk
where di = min(ei , fi ) for each i = 1, . . ., k. In case α and β have no
common prime factors, δ = 1.

Proof: By Lemma 2.58, δ divides both α and β. Furthermore, also by


Lemma 2.58, any ζ dividing both α and β must be of the form
ζ = wpc11 · · · pckk
with ci ≤ ei and ci ≤ fi , i.e., ci ≤ min(ei , fi ) = di for each i, so, once again
by Lemma 2.58, ζ divides δ. Thus, δ satisfies the properties of gcd(α, β)
(Definition 2.13). 

Remark 2.60. Recall that the gcd is only defined up to multiplication by a


unit, so for any unit w,
δ  = wδ = wpd11 · · · pdkk
is also a gcd of α and β.

Recall that we defined two elements α and β of a PID to be relatively


prime if their gcd is 1. We use the same language in the more general
situation of a UFD here.
Note that Lemma 2.61 is the direct (word-for-word) generalization of
Euclid’s Lemma (Lemma 2.41) to the case of a UFD. But in this more
general situation we need a new proof.

Lemma 2.61. Let R be a UFD and let α be a nonzero element of R. Let


β1 and β2 be elements of R and suppose that α divides β1 β2 . If α and β1
are relatively prime, then α divides β2 .

Proof: Since α and β1 are relatively prime, they have no common prime
factors. Thus, we have prime factorizations
α = pe11 · · · pekk ,
β1 = q1g1 · · · qg .

i i

i i
i i

i i

2.4. Unique Factorization Domains 59

Now we are assuming that α divides β1 β2 , so by Lemma 2.58 we see that


the prime factorization of β1 β2 must include pf11 · · · pfkk with fi ≥ ei , for
each i. But the prime factorization of β1 β2 is the product of the prime
factorization of β1 and the prime factorization of β2 . Since pf11 · · · pfkk does
not appear in the prime factorization of β1 , it must appear in the prime
factorization of β2 , and hence α divides β2 . 

The following corollaries are word-for-word generalizations of Corol-


lary 2.42 and Corollary 2.43 to the case of a UFD.
Corollary 2.62. Let R be a UFD and let α and β be relatively prime nonzero
elements of R. Let γ be an element of R and suppose that α divides γ and
β divides γ. Then αβ divides γ.

Proof: Once we have the generalization of Euclid’s Lemma 2.41 to UFDs


in Lemma 2.61, the identical proof of Corollary 2.42 works for UFDs, so
we may just quote that proof.
But we will give a second (albeit longer) proof that uses prime factor-
ization directly: Again, since α and β are relatively prime, they have no
common prime factors, so we have prime factorizations
α = pe11 · · · pekk ,
β = q1g1 · · · qg .

Then, by Lemma 2.58, applied first to α and γ and then to β and γ, we see
g
that the prime factorizations of γ must contain every pei i and every qj j , so
ek g1 g
it contains pi · · · pk q1 · · · q and hence αβ divides γ.
e1


Corollary 2.63. Let R be a UFD and let α, β, and γ be elements of R. Sup-


pose that α and β are relatively prime, and also that α and γ are relatively
prime. Then α and βγ are relatively prime.

Proof: Once we have the generalization of Euclid’s Lemma 2.41 to UFDs


in Lemma 2.61, the identical proof of Corollary 2.43 works for UFDs, so
we may just quote that proof.
But again we will give a proof that uses prime factorization directly:
Let α have prime factorization

α = pe11 · · · pekk .

Since α and β are relatively prime, no pi appears in the prime factor-


ization of β, and since α and γ are relatively prime, no pi appears in the
prime factorization of γ. Hence no pi appears in the prime factorization of
βγ, from which we conclude that α and βγ are relatively prime. 

i i

i i
i i

i i

60 2. Unique Factorization

2.5 Nonunique Factorization: The Case D < 0



So far we have devoted our efforts to proving that O( D) is a UFD for
various
√ values of D. Now we will concentrate our efforts on showing that
O( D) is not a UFD for other values of D. In fact, we will show that this
happens for infinitely many values of D.
As the reader will see, our results are much more complete for negative
values of D than they are for positive values of D. This reflects our present
state of knowledge—much more is known when D < 0 than is known when
D > 0. √
We will be discussing primes and irreducible elements in both O( D)
and Z. To keep these straight, in Lemma 2.64 and Corollary 2.65 we
will refer to primes in Z as ordinary primes. Note in Lemma 2.64 and
Corollary 2.65 that D may be negative or positive. √
The following result will help us recognize irreducible elements of O( D).

Lemma 2.64. Let R = O( D) and let p be an ordinary prime.

(1) If α is an element of R with α = |N (α)| = p, then α is irreducible.

(2) Suppose that R does not have an element β with β = p. If γ is an


element of R with γ = pp with p also an ordinary prime (perhaps
p = p), then γ is irreducible. In particular, if R does not have an
element β with β = p, then p is irreducible.

Proof:

(1) Suppose α = β1 β2 . Then

p = α = β1 β2  = β1  · β2 ,

so either β1  = 1, and β1 is a unit (Lemma 2.12), or β2  = 1, and


β2 is a unit. Thus α is irreducible.

(2) Suppose γ = β1 β2 . Then

pp = γ = β1 β2  = β1  · β2 ,

and we cannot have β1  = p, β2  = p , or vice versa, as R has no


elements of norm p. Hence, as in part (1), either β = 1 and β1 is a
unit, or β2  = 1 and β2 is a unit. Thus γ is irreducible. Finally, note
that p = p2 , so by setting γ = p, we conclude that p is irreducible.

i i

i i
i i

i i

2.5. Nonunique Factorization: The Case D < 0 61



The following corollary will be our basic tool in showing that O( D) is
not a UFD. We will be using it or its consequences in all of the examples in
this section and in the next section. So once again this provides an example
of working hard enough with a single idea and being able to go a long way
with it. We use it so often that we give it a name (although this name is
not standard mathematical language).

Corollary 2.65 (The Non-UFD Test). Let R = O( D).
If there is some ordinary prime p such that

(1) R does not have an element β with β = p, and

(2) R has an element α that is not divisible by p, but with α divisible
by p,

then R is not a UFD.

Proof: Write α = pq, for some q > 1. By Lemma 2.3, α = |αα|, so
α = pq gives
αα = ±pq.
By Lemma 2.64, p is irreducible. Now p does not divide α, by assumption,
and then p does not divide α either. Thus, p divides the product αα
without dividing either factor, so p is not a prime. Now in a UFD, every
irreducible is prime (Lemma 2.55), so R cannot be a UFD. 

Remark 2.66.
√ It is easy√to tell when p divides α. Suppose √ p is odd and
α = a√+ b D or (a + b D)/2. Then α/p = a/p + (b/p) D or ((a/p) +
(b/p) D)/2, so p divides α if and only if p divides a and p divides b. The
case p = 2√is a little more complicated. If D ≡ 2 or 3 (mod 4), then 2 divides
α = a + b D if and√only if a and b are both even. If D ≡ 1 (mod 4), then
2 divides α = a + b D with a and b integers if and only if either a and b
are both even or both odd integers. (This justifies the √ claim in the proof
of Corollary 2.65√that if p does not divide α = a + b D, then p does not
divide α = a − b D either.)

We √begin by studying the rings of integers in imaginary quadratic fields,


i.e., O( D) for D < 0.
In order to use Corollary 2.65, we have to come up with a suitable prime
p. In fact, p = 2 often works.

Lemma 2.67. If D < 0 and D = −1, −2, or −7, then O( D) does not
have an element of norm 2.

i i

i i
i i

i i

62 2. Unique Factorization

Proof: We divide the proof into two cases: (1) D ≡ 2 or 3 (mod 4), and (2)
D ≡ 1 (mod 4).

(1) Suppose D ≡ 2√or 3 (mod 4). Then any element α of O( D) is of the
form β = a + b D with a and b integers. Then

β = | N(β)| = |a2 − b2 D| = |a2 + (−D)b2 | = a2 + (−D)b2 .

So, in order for there to be an element β with β = 2, the equation


a2 + (−D)b2 = 2 must have a solution with a and b integers. But if
D < −2 (so −D > 2), it does not.

(2) Suppose D ≡ 1√(mod 4). Then any element β of O( D) is of√the form
(a) β = a + √ b D with a and b integers or (b) β = (a + b D)/2 =
(a/2) + (b/2) D with a and b odd integers. In case (a) the argument
is exactly the same as above, and β = 2 has no solutions. In case (b)

β = | N(β)| = |(a/2)2 − (b/2)2 D| = |(a/2)2 + (−D)(b/2)2 |


= a2 /4 + (−D)b2 /4.

So, in order for there to be an element β with β = 2, the equation


a2 /4 + (−D)b2 /4 = 2, i.e., a2 + (−D)b2 = 8, must have a solution with a
and b odd integers. But if D = −7, (so −D = 3 or −D > 7), it does not.

Theorem 2.68. Let D√< 0, D ≡ 1, 2, 3, 6, or 7 (mod 8), and D = −1, −2,


or −7. Then R = O( D) is not a UFD.

Proof: We begin by noting that, by Lemma 2.67, in no case does R have


an element β with β = 2. To proceed further, we divide the proof into
three cases: (1) D ≡ 2 or 6 (mod 8), (2) D ≡ 3 or 7 (mod 8), and (3)
D ≡ 1 (mod 8).

(1) In this case D ≡ 2 (mod 4). Let



α= D.
Then
α = −D.
Thus α is not divisible by 2 but α is divisible by 2 (as in this case
D is even). Hence, by Corollary 2.65 (the Non-UFD Test), R is not a
UFD.

i i

i i
i i

i i

2.5. Nonunique Factorization: The Case D < 0 63

(2) In this case D ≡ 3 (mod 4). Let



α=1+ D.

Then

α = 1 − D.

Thus α is not divisible by 2 but α is divisible by 2 (as in this case


D is odd, so 1 − D is even). Hence, by Corollary 2.65 (the Non-UFD
Test), R is not a UFD.

(3) In this case D ≡ 1 (mod 8). Let



1+ D
α= .
2

Then

α = (1 − D)/4.

Thus α is not divisible by 2 but α is divisible by 2 (as in this case


D ≡ 1 (mod 8), so 1 − D is divisible by 8 and so (1 − D)/4 is even).
Hence, by Corollary 2.65 (the Non-UFD Test), R is not a UFD. 

Thus, we see that we have complete information for D <√0 and D ≡ 1,


2, 3, 6, or 7 (mod 8). We showed in Corollary 2.53 that O( D) is a UFD

for D = −1, −2, or −7, and we just showed in Theorem 2.68 that O( D)
is not a UFD for any other such value of D.
Now, we need to investigate the case D < 0 and D ≡ 5 (mod 8). Here
we will only be able to get partial (but very suggestive) information. Again,
our key tool will be Corollary 2.65 (the Non-UFD Test), but we will not
be able to choose p = 2 in order to apply it. In fact, the value of p will
depend on D. Actually, our results will apply more generally than the case
D≡√ 5 (mod 8), so for many values of D we will have a second proof that
O( D) is not a UFD.
First, we make an observation that will make it easy for us to tell when
one of the hypotheses of Corollary 2.65 (the Non-UFD Test) is satisfied.

i i

i i
i i

i i

64 2. Unique Factorization

Lemma 2.69. Let D < 0.



(1) If D ≡ 2 or 3 (mod 4) and p is a prime with p < |D|, then R = O( D)
does not have an element β with β = p.

(2) If D ≡ 1 (mod 4) and p is a prime with p < |D|/4, then R = O( D)
does not have an element β with β = p.

Proof:

(1) Let β be an element of R. Then β = a + b D where a and b are
integers. Suppose β = p. Then

p = α = a2 − b2 D = a2 + b2 (−D)

and since p < |D| we must have b = 0, giving p = a2 , which is impos-


sible.

(2) Let β be an element of R. Then√ (a) β = a + b D where a and b
are integers or (b) β = (a + b D)/2 where a and b are odd integers.
Suppose β = p. The argument in case (a) is the same as above. In
case (b),

p = β = (a2 − b2 D)/4 = a2 /4 + b2 (−D)/4,

which is similarly impossible if p < |D|/4. 



Using this lemma we may show that in many cases O( D) is not a
UFD.

Proposition 2.70. Let D < 0, and D ≡ 1 (mod 4).



(1) If |D| is composite, then R = O( D) is not a UFD.

(2) If m = (1 − D)/4 is composite, then R = O( D) is not a UFD.

Proof:

(1) We claim that if D < 0, D ≡ 1 (mod 4), and |D| is composite, then D
has a prime factor p with p < |D|/4.
(To see this, note that if D < 0, D ≡ 1 (mod 4), and |D| is composite,
then |D| ≥ 15. If |D| = 15, then D is divisible by 3 and 3 < 15/4.
Otherwise |D| > 15. Let p be the smallest prime dividing |D|. Then
|D|/p = 1, so |D|/p is divisible by some prime p ≥ p. We now argue

i i

i i
i i

i i

2.5. Nonunique Factorization: The Case D < 0 65

by contradiction. Suppose p ≥ |D|/4. Then |D| = p(|D|/p) ≥ pp ≥


p2 ≥ (|D|/4)2 = |D|2 /16, so |D| < 16, a contradiction.)
By Lemma 2.69 we see that hypothesis (1) of Corollary 2.65 holds. Let

α = D.

Then
α = |D|,
so we see hypothesis (2) of Corollary 2.65 holds as well.√ Hence, by
Corollary 2.65 (the Non-UFD Test), we conclude that O( D) is not a
UFD.

(2) If m is composite then m is divisible by a prime p < m and then cer-


tainly p < |D|/4, so, by Lemma 2.69, hypothesis (1) of Corollary 2.65
holds. Let √
1+ D
α= .
2
Then
α = |m|,
so hypothesis (2) of Corollary 2.65 holds as well, and again,
√ by Corol-
lary 2.65 (the Non-UFD Test), we conclude that O( D) is not a
UFD. 


Corollary 2.71. For D ≡ 5 (mod 8), D < 0, and |D| < 1000, O( D) is
not a UFD except (possibly) for the cases D = −3, −11, −19, −43, −67,
and −163 and the cases D = −211, −283, −331, −547, −691, −787, and
−907.

Proof: For every value of D with D ≡ 5 (mod 8), D < 0, and |D| < 1000,
except for those listed in the statement of the corollary, |D|√is composite
or m = (1 − D)/4 is composite, so by Proposition 2.70, O( D) is not a
UFD. 

Actually, using the same idea, we can get a stronger test than that in
Proposition 2.70, and use it to strengthen Corollary 2.71.

Proposition 2.72. Let D < 0 and D ≡ 1 (mod 4).



√ even integer a < (1/4) D + 16D with a −
2 2
(1) If there is an nonnegative
D composite, then O( D) is not a UFD.

i i

i i
i i

i i

66 2. Unique Factorization

odd integer a < (1/2) D2 + 4D with (a2 −
(2) If there is a nonnegative √
D)/4 composite, then O( D) is not a UFD.

Proof:

(1) Let

α=a+ D.

Then
α = a2 − D.

Now a < (1/4) D2 + 16D implies, by simple algebra, that a2 − D <
(D/4)2 , so a2 − D must have a prime factor p less than |D|/4. Thus, by
Lemma 2.69, we see that hypothesis (1) of Corollary 2.65 holds. Since
a is even, a2 − D is odd, so p is odd and hence α is not divisible by p.
Thus, we see that hypothesis (2) of Corollary 2.65 holds as well.√Hence
we conclude that, by Corollary 2.65 (the Non-UFD Test), O( D) is
not a UFD.

(2) Let

α = (a + D)/2,

and note that α is in O( D) as a is odd. Then

α = (a2 − D)/4.


Then, similarly, a < (1/2) D2 + 4D implies that (a2 −D)/4 < (D/4)2 ,
so (a2 − D)/4 must have a prime factor p less than |D|/4. Certainly α
is not divisible by p. Thus, we see that hypothesis (2) of Corollary 2.65
holds as well.
√ Hence we conclude that, by Corollary 2.65 (the Non-UFD
Test), O( D) is not a UFD. 

Remark 2.73. Proposition 2.72 is a direct generalization of Proposition 2.70.


Setting a = 0 in part (1) of Proposition 2.72 recovers part (1) of Proposi-
tion 2.70, and setting a = 1 in part (2) of Proposition 2.72 recovers part
(2) of Proposition 2.70.

Corollary 2.74. For D ≡ 5 (mod 8), D < 0, and |D| < 1000, O( D) is not
a UFD except (possibly) for the cases D = −3, −11, −19, −43, −67, and
−163.

i i

i i
i i

i i

2.6. Nonunique Factorization: The Case D > 0 67

Proof: We use Proposition 2.72 to decide some of the cases of D we could


not handle with Proposition 2.70, as follows: for D = −211, −283, −331,
−547, −691, or −787, we choose a = 2, and for D = −907, we choose
a = 4.
Here the value of a given in the above
√ list is a value of a that can be used
in Proposition 2.72 to show that O( D) is not a UFD. (For example, if

D = −211, α = (2 + −211) has α = 215, a composite number divisible

by the prime p = 5, and if D = −907, α = (4 + −907)/2 has α = 923,
a composite number divisible by the prime p = 13.) 

In fact, Proposition 2.72 is a very effective way of showing that O( D)
is not a UFD.

Corollary
√ 2.75. For D ≡ 5 (mod 8), D < 0, and |D| < 1, 000, 000, 000,
O( D) is not a UFD except (possibly) for the cases D = −3, −11, −19,
−43, −67, and −163.

Proof: A computer computation using Proposition 2.72 rules out all values
of D except for those in the statement of the corollary. (As a matter of
curiosity, the largest value of a we have to consider for D in this range is
a = 11, which occurs for D = −543, 764, 323.) 

Remark √2.76. What about the exceptions in Corollary 2.75? It turns out
that O( D) is a UFD for the cases D = −3, −11, −19, −43, −67, and
−163. Given the large range of values of D, the reader may (strongly)

suspect that these are the only such values of D for which O( D) is a
UFD. This turns out to be true, and is a very deep (and famous) theorem.

2.6 Nonunique Factorization: The Case D > 0


Now√ we turn to studying rings of integers in real quadratic fields, i.e.,
O( D) for D > 0. Actually, the methods we derive here will also apply to
values of D with D < 0, giving a second proof of the results in those cases.
Again our strategy is the same, but the details are harder to carry out, and
our results are much less general. But once again it is Corollary 2.65 (the
Non-UFD Test) that is our basic tool, which we use in deriving the criteria
in Theorems 2.78, 2.82, and 2.87. Once again, in order to use it we begin
by proving the nonexistence of elements of norm 2.

i i

i i
i i

i i

68 2. Unique Factorization

Lemma 2.77. Suppose that

(1) D is divisible by a prime p congruent to 5 (mod 8), or

(2) D is divisible by a prime p congruent to 3 (mod 8) and D is also


divisible by a prime p congruent to 7 (mod 8).

Then O( D) does not have an element β with β = 2.

Proof: Set q = p if condition (1) holds and set q = p p if condition (2)
holds.
Once again, we divide the proof into two cases: (1) D ≡ 2 or 3 (mod 4),
and (2) D ≡ 1 (mod 4).

(1) Suppose D ≡ 2√or 3 (mod 4). Then any element β of O( D)√is of the
form β = a + b D with a and b integers. Suppose β = a + b D with
β = 2. Then
a2 − Db2 = ±2,
so
a2 ≡ ±2 (mod D),
and hence
a2 ≡ ±2 (mod q).

But it is a result from number theory (see Corollary B.37(2) in the sit-
uation of condition (1) and Corollary B.38 in the situation of condition
(2)) that the congruence a2 ≡ ±2 (mod q) does not have a solution.

(2) Suppose D ≡ 1√(mod 4). Then any element β of O( D) is of√the form
(2a) β = a +√b D with a and b integers or (2b) β = (a + b D)/2 =
(a/2) + (b/2) D with a and b odd integers. In case (2a) the argument
is exactly
√ the same as above. We consider case (2b). Suppose β =
(a + b D)/2 with β = 2. Then

(a2 − Db2 )/4 = ±2,

giving the equation


a2 − Db2 = ±8,
so
a2 ≡ ±8 (mod D)
and hence
a2 ≡ ±8 (mod q)

i i

i i
i i

i i

2.6. Nonunique Factorization: The Case D > 0 69

and again this congruence has no solution. (Since 8 = 22 · 2, the


congruence a2 ≡ ±8 (mod p) has a solution if and only if the congruence
a2 ≡ ±2 (mod q) has a solution, and it does not.) 

Theorem 2.78. Let D ≡ 1, 2, 3, 6, or 7 (mod 8), and suppose that D


is divisible by a prime p ≡ 5 (mod 8) or that D is divisible
√ by a prime
p ≡ 3 (mod 8) and a prime p ≡ 7 (mod 8). Then R = O( D) is not a
UFD.

Proof: We begin by noting that, by Lemma 2.77, in no case does R have


an element β with β = 2. The remainder of the proof is identical to the
proof of Theorem 2.68, so instead of repeating it we simply refer the reader
back to that proof. 

Now let us give some examples of Theorem 2.78. Before we do so, we


want to quote the Chinese Remainder Theorem (Lemma B.17 or Theo-
rem B.18), which states that a pair of congruences

x ≡ a1 (mod b1 ),
x ≡ a2 (mod b2 ),

with b1 and b2 relatively prime, always has a solution, and that this solution
is unique (mod b1 b2 ), or in other words, that this pair of congruences is
equivalent to a single congruence

x ≡ a3 (mod b1 b2 )

for some a3 . (There are various methods for finding a3 . If b1 and b2 are
small, trial and error is as good as any.) We will use this in the statement
of our results.

Example 2.79. We wish to apply Theorem 2.78. First, we consider condi-


tion (1) of that theorem. To apply this condition, we need to find primes
p with p ≡ 5 (mod 8). The first few of these primes are p = 5, p = 13, and
p = 29.

(1) Consider p = 5. We want D to be divisible by p = 5, i.e., D ≡


0 (mod 5). We first apply this to the case D ≡ 1 (mod 8), so we have
the pair of simultaneous congruences

D ≡ 0 (mod 5) and D ≡ 1 (mod 8),

i i

i i
i i

i i

70 2. Unique Factorization

which is equivalent to the single congruence


D ≡ 25 (mod 40).

We next apply this to the case D ≡ 2 (mod 8), so we have the pair of
simultaneous congruences
D ≡ 0 (mod 5) and D ≡ 2 (mod 8),

which is equivalent to the single congruence


D ≡ 10 (mod 40).

Proceeding in this way, we find that the pair of congruences


D ≡ 0 (mod 5) and D ≡ 1, 2, 3, 6, or 7 (mod 8)
is equivalent to the single congruence
D ≡ 10, 15, 25, 30, or 35 (mod 40).

So for these values of D, by Theorem 2.78, we have that O( D) is not
a UFD. Remember that D must be square-free, and so the first few
such positive values of D are D = 10, 15, 30, 35, 55, 65, 70, 95, 105,
110, 115, 130, 145, 155, 170, 185, 190, and 195.
(2) Consider p = 13. Then we have the pair of simultaneous congruences
D ≡ 0 (mod 13) and D ≡ 1, 2, 3, 6, or 7 (mod 8),
which is equivalent to the single congruence
D ≡ 26, 39, 65, 78, or 91 (mod 104).

So for these values of D, by Theorem 2.78, we have that O( D) is not
a UFD. Remembering that D must be square-free, the first few such
positive values of D are D = 26, 39, 65, 78, 130, 143, 182, and 195.
(3) Consider p = 29. Then we have the pair of simultaneous congruences
D ≡ 0 (mod 29) and D ≡ 1, 2, 3, 6, or 7 (mod 8),
which is equivalent to the single congruence
D ≡ 58, 87, 145, 174, or 203 (mod 232).

So for these values of D, by Theorem 2.78, we have that O( D) is not
a UFD.

i i

i i
i i

i i

2.6. Nonunique Factorization: The Case D > 0 71

(4) Next, we consider condition (2) of Theorem 2.78. To apply this con-
dition we need to find a pair of primes p and p with p ≡ 3 (mod 8)
and p ≡ 7 (mod 8). The first such pair is p = 3 and p = 7. Then we
have the pair of simultaneous congruences

D ≡ 0 (mod 21) and D ≡ 1, 2, 3, 6, or 7 (mod 8),

which is equivalent to the single congruence

D ≡ 42, 63, 105, 126, or 147 (mod 168).



So for these values of D, by Theorem 2.78, we have that O( D) is not
a UFD.

Just as in the D < 0 case, we applied Corollary 2.65 (the Non-UFD


Test) with p = 2 to prove Theorem 2.78. Again, as in the D < 0 case, in
order to proceed further we wish to apply Corollary 2.65 (the Non-UFD
Test) more generally. Again, we will not just be able to choose p = 2 in
order to apply it, but the value of p will depend on D. Note the similarity
of the following lemma (Lemma 2.80) to Lemma 2.77, and note also the
similarity of the proofs.

Lemma 2.80. Let D be divisible by an odd prime p2 , and suppose fur-


thermore that there is an odd prime p1 such that√the congruence x2 ≡
±p1 (mod p2 ) does not have a solution. Then O( D) does not have an
element of norm p1 .

Proof: Once again we divide the proof into two cases: (1) D ≡ 2 or
3 (mod 4), and (2) D ≡ 1 (mod 4).

(1) Suppose D ≡ 2√or 3 (mod 4). Then any element β of O( D)√is of the
form β = a + b D with a and b integers. Suppose β = a + b D with
β = p1 . Then
a2 − Db2 = ±p1 ,
so
a2 ≡ ±p1 (mod D),
and hence
a2 ≡ ±p1 (mod p2 ).

But by hypothesis this congruence has no solution.

i i

i i
i i

i i

72 2. Unique Factorization

(2) Suppose D ≡√1 (mod 4). Then any element β of O( √ D) is of the form
= a+b D with a and b integers or (b) β = (a+b D)/2 = (a/2)+
(a) β √
(b/2) D with a and b odd integers. In case (a), the argument is exactly

the same as above. We consider case (b). Suppose β = (a + b D)/2
with β = p1 . Then

(a2 − Db2 )/4 = ±p1 ,

giving the equation


a2 − Db2 = ±4p1 ,
so
a2 ≡ ±4p1 (mod D),
and hence
a2 ≡ ±4p1 (mod p2 ),
and again this congruence has no solution. (As in the proof of Lemma 2.77,
since 4p1 = 22 p1 , a2 ≡ ±4p1 (mod p2 ) has a solution if and only if
a2 ≡ ±p1 (mod p2 ) has a solution, and by hypothesis it does not.) 

Remark 2.81. Let p2 be an odd prime and let q be any integer relatively
prime to p. If p2 ≡ 3 (mod 4), then, by Corollary B.34, exactly one of
the congruences x2 ≡ q (mod p2 ) and x2 ≡ −q (mod p2 ) has a solution,
so the hypothesis of Lemma 2.80 is never satisfied. If p ≡ 1 (mod 4),
then, by Corollary B.34, either both of the congruences x2 ≡ q (mod p2 )
and x2 ≡ −q (mod p2 ) have a solution, or neither does, so in this case the
hypothesis in Lemma 2.80 that x2 ≡ ±p1 (mod p2 ) does not have a solution
is equivalent to the simpler hypothesis that x2 ≡ p1 (mod p2 ) does not have
a solution.

Theorem 2.82. Let D be divisible by a prime p2 ≡ 1 (mod 4), and suppose


furthermore that there is an odd prime p1 such that

(1) the congruence x2 ≡ p1 (mod p2 ) does not have a solution, and

(2) the congruence x2 ≡ D (mod p1 ) has a solution.



Then O( D) is not a UFD.

Proof: By Lemma 2.80 and Remark 2.81, hypothesis (1) implies that O( D)
does not have an element of norm p1 .

i i

i i
i i

i i

2.6. Nonunique Factorization: The Case D > 0 73

We also claim that D has an element α not divisible by p1 but with α
divisible by p1 . By hypothesis (2), there is an a with a2 ≡ D (mod p1 ). Let

α = a + D.
√ √
Clearly α is not divisible by p1 as α/p1 = (a + D)/p1 is not in O( D).
(Note that p1 is odd.) But then, by hypothesis (2),

α = |a2 − D|

is divisible by p1 . Thus, by Corollary 2.65 (the Non-UFD Test), O( D) is
not a UFD. 

Now let us give some examples of the use of Theorem 2.82. Once again
we will formulate our results with the help of the Chinese Remainder The-
orem.
Example 2.83. We wish to apply Theorem 2.82. For this we need to find
primes p2 with p2 ≡ 1 (mod 4). The first such primes are p2 = 5 and
p2 = 13.
(1) Consider p2 = 5. We want to find odd primes p1 with x2 ≡ p1 (mod 5)
not having a solution. Since the squares (mod 5) are 0, 1, and 4, this
means that p1 ≡ 2 or 3 (mod 5). But also p1 ≡ 1 (mod 2) (as p1 is
odd), so by the Chinese Remainder Theorem, the condition on p1 in
this case is p1 ≡ 3 or 7 (mod 10). The first such primes are p1 = 3 and
p1 = 7.

(a) Consider p1 = 3. We want to find values of D with x2 ≡ D (mod p1 )


having a solution. Since the squares (mod 3) are 0 and 1, this
means that D ≡ 0 or 1 (mod 3). Also, D is divisible by p2 = 5,
i.e., D ≡ 0 (mod 5). So we have the pair of simultaneous congru-
ences

D ≡ 0 (mod 5) and D ≡ 0 or 1 (mod 3),

which is equivalent to the single congruence

D ≡ 0 or 10 (mod 15).

So for these values of D, by Theorem 2.82, we have that O( D)
is not a UFD. Remembering that D must be square-free, the first
few such positive values of D are D = 10, 15, 30, 55, 70, 85, 105,
115, 130, 145, 165, and 195.

i i

i i
i i

i i

74 2. Unique Factorization

(b) Consider p1 = 7. We want to find values of D with x2 ≡ D (mod p1 )


having a solution. Since the squares (mod 7) are 0, 1, 2, and 4, this
means that D ≡ 0, 1, 2, or 4 (mod 7). So we have the pair of
simultaneous congruences

D ≡ 0 (mod 5) and D ≡ 0, 1, 2, or 4 (mod 7),

which is equivalent to the single congruence

D ≡ 0, 15, 25, or 30 (mod 35).



So for these values of D, by Theorem 2.82, we have that O( D)
is not a UFD. Remembering that D must be square-free, the first
few such positive values of D are D = 15, 30, 35, 65, 70, 85, 95,
105, 115, 130, 155, 165, 170, and 190.

(2) Consider p2 = 13. We want to find odd primes p1 with x2 ≡ p1 (mod 13)
not having a solution. Since the squares (mod 13) are 0, 1, 3, 4, 9, 10,
and 12, this means that p1 ≡ 2, 5, 6, 7, 8, or 11 (mod 13). But also
p1 ≡ 1 (mod 2), so by the Chinese Remainder Theorem, the condition
on p1 in this case is p1 ≡ 5, 7, 11, 15, 19, or 21 (mod 26). The first
such primes are p1 = 5 and p1 = 7.

(a) Consider p1 = 5. We want to find values of D with x2 ≡ D (mod p1 )


having a solution. Since the squares (mod 5) are 0, 1, and 4, this
means that D ≡ 0, 1, or 4 (mod 5). Also, D ≡ 0 (mod 5). So we
have the pair of simultaneous congruences

D ≡ 0 (mod 13) and D ≡ 0, 1, or 4 (mod 5),

which is equivalent to the single congruence

D ≡ 0, 26, or 39 (mod 65).



So for these values of D, by Theorem 2.82, we have that O( D)
is not a UFD. Remembering that D must be square-free, the first
few such positive values of D are D = 26, 39, 65, 91, 130, and 195.
(b) Consider p1 = 7. We want to find values of D with x2 ≡ D (mod p2 )
having a solution. Since the squares (mod 7) are 0, 1, 2, and 4, this
means that D ≡ 0, 1, 2, or 4 (mod 7). So we have the pair of
simultaneous congruences

D ≡ 0 (mod 13) and D ≡ 0, 1, 2, or 4 (mod 7),

i i

i i
i i

i i

2.6. Nonunique Factorization: The Case D > 0 75

which is equivalent to the single congruence

D ≡ 0, 39, 65, or 78 (mod 91).



So for these values of D, by Theorem 2.82, we have that O( D)
is not a UFD. Remembering that D must be square-free, the first
few such positive values of D are D = 39, 65, 78, 91, 130, and 182.

Remark 2.84. In Example 2.83, we used the conditions in Theorem 2.82


in the order they naturally arose: first we chose p2 , and then, for a given
choice of p2 , we chose p1 . But, with the help of the Law of Quadratic
Reciprocity (Theorem B.40), we can make the choice in the other order.
The Law of Quadratic Reciprocity applies to the system of simultaneous
congruences

x2 ≡ p1 (mod p2 ),
x2 ≡ p2 (mod p1 ),

where p1 and p2 are distinct odd primes. It states that if at least one of
p1 and p2 is congruent to 1 (mod 4), then either both of these congruences
have a solution or neither does, while if both p1 and p2 are congruent to
3 (mod 4), then exactly one of these congruences has a solution. Since in
Theorem 2.82 we require that p2 ≡ 1 (mod 4), we are in the first of these
cases, and so we may replace hypothesis (1) in Theorem 2.82 by hypothesis
(1 ):
(1 ) The congruence x2 ≡ p2 (mod p1 ) does not have a solution.

We see how to use this in the next example.

Example 2.85. We wish to apply Theorem 2.82 with hypothesis (1) re-
placed by hypothesis (1 ). For this we need to find an odd prime p1 . The
first such prime is p1 = 3.

(1) Consider p1 = 3. In order to satisfy hypothesis (2) of Theorem 6.6


we need to find values of D such that x2 ≡ D (mod p1 ) has a solu-
tion. Since the squares (mod 3) are 0, and 1, this means that D ≡
0 or 1 (mod 3). In order to satisfy hypothesis (1 ) of Theorem 6.6 we
need to find values of p1 such that x2 ≡ p2 (mod p1 ) does not have
a solution. Since the squares (mod 3) are 0, and 1, this means that
p2 ≡ 2 (mod 3). Since we are also requiring p2 ≡ 1 (mod 4), we ap-
ply the Chinese Remainder Theorem to conclude that we must have
p2 ≡ 5 (mod 12). The first such primes are p2 = 5 and p2 = 17.

i i

i i
i i

i i

76 2. Unique Factorization

(a) Consider p2 = 5. This is the same as case (1a) of Example 2.83, so


yields nothing new.
(b) Consider p2 = 17. We proceed by exactly the same logic as before.
We want to find values of D with
D ≡ 0 or 1 (mod 3) and D ≡ 0 (mod 17),
and, by the Chinese Remainder Theorem, this pair of simultaneous
congruences is equivalent to the single congruence
D ≡ 0 or 34 (mod 51).

So for these values of D, by Theorem 2.82 we have that O( D)
is not a UFD. Remembering that D must be square-free, the first
few such positive values of D are D = 34, 51, 85, 102, and 187.

Remark 2.86. The techniques in Example 6.7 and Example 6.9 produce the
same pairs (p1 , p2 ) and hence the same values of D, but they produce them
in a different order. Depending on circumstances, either one may be more
convenient to use.
We now give an example of a rather trickier application
√ of Corollary 2.65
(the Non-UFD Test) that enables us to show that O( D) is not a UFD in
some additional cases.
Theorem 2.87. Let D be congruent to 2 (mod √ 8) and suppose that D is
divisible by a prime p ≡ 3 (mod 8). Then O( D) is not a UFD.

Proof: We shall show that O( D) does not have an element of norm p.
We prove this by contradiction.

Suppose β = a+b D with β = p. Then |a2 −Db2 | = p, so a2 −Db2 =
±p and hence a2 = ±p + Db2 . Since the right-hand side of this equation
is divisible by p, the left-hand side must be divisible by p as well, and
so a = pc for some c. Substituting into this equation and dividing each
term by p, we obtain the equation pc2 = ±1 + db2 , where d = D/p. Since
D ≡ 2 (mod 8) and p ≡ 3 (mod 8), we see that d ≡ 6 (mod 8). Thus, this
equation yields the congruence
3c2 ≡ ±1 + 6b2 (mod 8).
But, as you can easily check, for any integer x, x2 ≡ 0, 1, or 4 (mod 8).
Substituting these possibilities
√ for c2 and b2 , we see that this congruence
has no solution. Let α = D. Then α is not divisible by p, but α√= |D|
is divisible by p. Hence, by Corollary 2.65 (the Non-UFD Test), O( D) is
not a UFD. 

i i

i i
i i

i i

2.6. Nonunique Factorization: The Case D > 0 77

D (q1 , q2 ) D (q1 , q2 ) D (q1 , q2 ) D (q1 , q2 )


10 (2, 5) 155 (2, 5) 274 (3, 137) 394 (2, 197)
15 (2, 5) 159 (2, 53) 282 (2, 141) 395 (2, 5)
26 (2, 13) 165 (3, 5) 285 (3, 5) 399 (2, 21)
30 (2, 5) 170 (2, 5) 286 (2, 13) 402 (3, 402)
34 (3, 17) 174 (2, 29) 287 (7, 41) 403 (2, 13)
35 (2, 5) 178 (3, 89) 290 (2, 5) 406 (2, 29)
39 (2, 13) 182 (2, 13) 291 (5, 97) 407 (2, 37)
42 (2, 21) 183 (2, 61) 295 (2, 5) 410 (2, 5)
51 (3, 17) 185 (2, 5) 298 (2, 149) 411 (3, 137)
55 (2, 5) 186 (2, 93) 299 (2, 13) 415 (2, 5)
58 (2, 29) 187 (3, 17) 303 (2, 101) 418 (11, 418)
65 (2, 5) 190 (2, 5) 305 (2, 5) 426 (2, 213)
66 (3, 66) 194 (5, 97) 310 (2, 5) 427 (2, 61)
70 (2, 5) 195 (2, 5) 314 (2, 157) 429 (5, 13)
74 (2, 37) 202 (2, 101) 318 (2, 53) 430 (2, 5)
78 (2, 13) 203 (2, 29) 319 (2, 29) 435 (2, 5)
82 (3, 41) 205 (3, 5) 323 (7, 17) 438 (7, 73)
85 (3, 5) 210 (2, 5) 327 (2, 109) 442 (2, 13)
87 (2, 29) 215 (2, 5) 330 (2, 5) 445 (2, 5)
91 (2, 13) 218 (2, 109) 335 (2, 5) 447 (2, 149)
95 (2, 5) 219 (5, 73) 339 (3, 113) 451 (3, 41)
102 (3, 17) 221 (2, 13) 345 (2, 5) 455 (2, 5)
105 (2, 5) 222 (2, 37) 346 (2, 173) 458 (2, 229)
106 (2, 53) 226 (3, 113) 354 (3, 354) 462 (2, 21)
110 (2, 5) 230 (2, 5) 355 (2, 5) 465 (2, 5)
111 (2, 37) 231 (2, 21) 357 (3, 17) 466 (3, 233)
114 (3, 114) 235 (2, 5) 362 (2, 181) 470 (2, 5)
115 (2, 5) 238 (3, 17) 365 (5, 73) 471 (2, 157)
119 (5, 17) 246 (3, 41) 366 (2, 61) 474 (2, 237)
122 (2, 61) 247 (2, 13) 370 (2, 5) 481 (2, 13)
123 (3, 41) 255 (2, 5) 371 (2, 53) 482 (11, 241)
130 (2, 5) 258 (3, 258) 374 (5, 17) 483 (2, 21)
138 (2, 69) 259 (2, 37) 377 (2, 29) 485 (5, 97)
143 (2, 13) 265 (2, 5) 385 (2, 5) 493 (3, 17)
145 (2, 5) 266 (19, 266) 386 (5, 193) 494 (2, 13)
146 (5, 73) 267 (3, 89) 390 (2, 5) 498 (3, 498)
154 (2, 77) 273 (2, 13) 391 (3, 17)

√ 2.1. The values of D between 2 and 499 for which our methods show that
Table
O( D) is not a UFD.

Example 2.88. We wish to apply Theorem 2.87. For this we need primes
congruent to 3 (mod 8). The first such primes are p = 3 and p = 11.

i i

i i
i i

i i

78 2. Unique Factorization

(1) Consider p = 3. Then the hypotheses of Theorem 2.87 give the pair of
simultaneous congruences

D ≡ 2 (mod 8) and D ≡ 0 (mod 3),

which, by the Chinese Remainder Theorem, is equivalent to the single


congruence
D ≡ 18 (mod 24).

So for these values of D, by Theorem 2.87 we have that O( D) is not
a UFD. Remembering that D must be square-free, the first few such
positive values of D are D = 42, 66, 114, 138, and 186.

(2) Consider p = 11. Then the hypotheses of Theorem 2.87 give the pair
of simultaneous congruences

D ≡ 2 (mod 8) and D ≡ 0 (mod 11),

which, by the Chinese Remainder Theorem, is equivalent to the single


congruence
D ≡ 66 (mod 88).

So for these values of D, by Theorem 6.11 we have that O( D) is not
a UFD. Remembering that D must be square-free, the first few such
positive values of D are D = 66, 154, 330, 418, and 498.

Example 2.89. Table 2.1 is a√table of values of D between 2 and 499 for
which we can show that O( D) is not a UFD by using Theorem 2.78,
Theorem 2.82, or Theorem 2.87. For each value of D we give a pair (q1 , q2 )
that provides the argument. (In case q1 = 2, we are using Theorem 2.78
with q2 = q, as in Example 2.79. In case q1 is an odd prime, we are using
Theorem 2.82, as in Example 2.83 or Example 2.85, or Theorem 2.87, as
in Example 2.88. In this last case we have simply set q2 = D. For many
values of D, there is more than one pair (q1 , q2 ) that work. In those cases,
we have simply chosen one.)

2.7 Summing Up
In the preceding √sections of this chapter, we have shown that certain inte-
gral domains O( D) are or are not unique factorization domains. In this
section we will sum up our work and also report on some interesting results
that are beyond our ability to prove here.

i i

i i
i i

i i

2.7. Summing Up 79

We will state the results for the cases of imaginary quadratic fields
(D < 0) and real quadratic fields (D > 0) separately.
First, imaginary quadratic fields.

Theorem 2.90. Let D < 0, and let R = O( D).

(1) If D = −11, −7, −3, −2, or −1, then R is a UFD.

(2) If D ≡ 1, 2, 3, 6, or 7 (mod 8) and D = −1, −2, or − 7, then R is


not a UFD.

(3) If D ≡ 5 (mod 8) and |D| is composite, then R is not a UFD.

Proof: (1) is Lemma 2.9; (2) is Theorem 2.68; and (3) is part of Proposi-
tion 2.70. 

Note that this theorem leaves an infinite number of cases open, those with
D ≡ 5 (mod 8), |D| prime. Some of these cases we have dealt with in
Proposition 2.72 (applied in Corollary 2.75).
The following is a very deep theorem.

Theorem
√ 2.91. There are exactly nine values of D < 0 for which R =
O( D) is a UFD. They are D = −1, −2, −3, −7, −11, −19, −43, −67,
and −163.

Next, real quadratic fields.



Theorem 2.92. Let D > 0, and let R = O( D).

(1) If D = 2, 3, 5, 6, 7, 11, 13, 17, 21, or 29, then R is a UFD.

(2) If one of the following conditions holds,

(a) D ≡ 1, 3, 6, or 7 (mod 8) and (i) D is divisible by a prime con-


gruent to 5 (mod 8) or (ii) D is divisible by a prime congruent to
3 (mod 8) and by a prime congruent to 7 (mod 8); or
(b) D ≡ 2 (mod 8) and (i) D is divisible by a prime congruent to
5 (mod 8) or (ii) D is divisible by a prime congruent to 3 (mod 8),

then R is not a UFD.

Proof: (1) is Lemma 2.9 and (2) is Theorem 2.78 combined with Theo-
rem 2.87. 

i i

i i
i i

i i

80 2. Unique Factorization

In Theorem 2.82 we were able to handle other values of D. See also


Example 2.79, Example 2.83, Example 2.85, and Example 2.88.
Note that the results for real quadratic fields are much less complete
than the results for imaginary quadratic fields. We have the following
conjecture, which has been open for 200 years:

Conjecture 2.93
√ (Gauss). There are an infinite number of values of D > 0
for which O( D) is a UFD.

Remark 2.94. √ There is an effective procedure, due to Gauss, to decide


whether O( D) is a UFD for any√ given value √ of D. To be precise, it
computes the class number of O( D), and O( D) is a UFD precisely
when the class number is 1.

Remark 2.95. Calculation shows that R = O( D) is a UFD for the fol-
lowing positive values of D < 100: D = 2, 3, 5, 6, 7, 11, 13, 14, 17, 19, 21,
22, 23, 29, 31, 33, 37, 38, 41, 43, 46, 47 ,53, 57, 59, 61, 62, 67, 69, 71, 73,
77, 83, 86, 89, 93, 94, and 97.

2.8 Exercises
Exercise 2.1. Complete the proof of Lemma 2.7. (You must prove Lemma 2.7
in the cases not explicitly done in the text, i.e., in the cases where a ≤ 0
and b > 0; where a ≥ 0 and b < 0; and where a ≤ 0 and b < 0. You may
do so by proving each of these cases from scratch, adapting the proof of
the case a ≥ 0 and b > 0 given in the text to these other cases, but even
better would be a proof that simply reduces these other cases to the case
a ≥ 0 and b > 0 and uses the fact that we know Lemma 2.7 is true in that
case.)

Exercise 2.2. Lemma 2.7 shows that for any integer a and any nonzero
integer b, there exist integers q and r with a = bq + r and −|b| + 1 ≤ r ≤
|b| − 1. Show that for any integer a and any nonzero integer b, there exist
unique integers q and r with a = bq + r and 0 ≤ r ≤ |b| − 1.

Note by Lemma 2.14 that elements of an integral domain R will not in


general have a unique gcd. Thus in the following problems we cannot write
LHS = RHS (where LHS (respectively RHS) stands for left-hand side
(respectively right-hand side) of the equation). We thus write LHS ∼
= RHS
where by ∼= we mean can be chosen to be equal to. Also by Lemma 2.14,
we see that the choice involves multiplication by some unit of R.

i i

i i
i i

i i

2.8. Exercises 81

Exercise 2.3. Prove the following properties of a gcd of elements in an in-


tegral domain R:

(a) for any c, gcd(a, ca + b) ∼


= gcd(a, b),

(b) for any c = 0, gcd(ac, bc) ∼


= c gcd(a, b),

(c) if c divides both a and b, then gcd(a/c, b/c) ∼


= gcd(a, b)/c,

(d) if c is relatively prime to b, then gcd(ac, b) ∼


= gcd(a, b).

In Exercises 2.4–2.11 R is assumed to be a PID. Do these exercises by


using the results of Sections 2.1–2.3.

Exercise 2.4. Let α, β, γ, and δ be elements of R. If α divides γ and β


divides δ, show that gcd(α, β) divides gcd(γ, δ).

Exercise 2.5. Let α and β be elements of R. Show that α and β are rela-
tively prime if and only if α2 and β 2 are relatively prime.

Exercise 2.6. Let α, β, and γ be nonzero elements of R. Show that

gcd(α, β, γ) ∼
= gcd(gcd(α, β), γ).

(Hence the gcd of a finite set of elements of R can be found by successively


finding the gcd of a pair of elements. In case R is a Euclidean domain this
can be done by using Euclid’s algorithm.)

Analogously to the gcd of elements of R, we can define an lcm of ele-


ments of R as follows:

Let R be an integral domain and let {αi } be a finite set of


elements of R, not all of which are zero. Then an element λ of
R is a least common multiple (lcm) of {αi }, λ = lcm({αi }) if

(a) each αi divides λ,


(b) if ζ is any element of R that is divisible by αi , then λ
divides ζ.

Exercise 2.7. Prove the analog of Lemma 2.14:

(a) if λ is an lcm of {αi } and ε is any unit of R, then λ = λε is also an


lcm of {αi },

i i

i i
i i

i i

82 2. Unique Factorization

(b) if λ and λ are any two lcm’s of {αi }, then λ = λε for some unit ε of
R.

Exercise 2.8. Let α and β be elements of R, not both of which are zero.
Suppose that α and β are relatively prime. Show that αβ is an lcm of α
and β.

Exercise 2.9. More generally, let α and β be any two elements of R, not
both of which are zero. Let γ be a gcd of α and β. Show that λ = αβ/γ is
an lcm of α and β. Thus we see that if γ is a gcd of α and β, and λ is an
lcm of α and β, then γλ ∼= αβ.

Exercise 2.10. Let α, β, and γ be any nonzero elements of R. Show that

lcm(α, β, γ) ∼
= lcm(lcm(α, β), γ).

(Hence the lcm of a finite set of elements of R can be found by successively


finding the lcm of a pair of elements. In case R is a Euclidean domain this
can be done by using Euclid’s algorithm to find gcd’s and then using the
result of Exercise 2.5.)

Exercise 2.11. Use the result of the preceding exercise and induction to
show that every finite set of elements of R, not all of which are zero, has
an lcm.

Exercise 2.12. Suppose that R is a UFD. In the notation of Proposition 2.59,


let μ = pj11 · · · pjkk q1g1 · · · qg r1h1 · · · rm
hm
where ji = max(ei , fi ) for each
i = 1, . . . , k. Show that μ is an lcm of α and β. (In particular, α and
β have an lcm.)

Exercises 2.3 –2.11 . Do Exercises 2.3–2.11 for a UFD R by using the re-
sults of Section 2.4. Exercises 2.3 –2.11 are easier than Exercises 2.3–2.11,
but that is because we are using more background—the prime factorization
of elements of R.

Exercise 2.13. Give an example of an infinite set of elements in Z, not all


of which are zero, that does not have an lcm.

In the case R = Z, the gcd of a set of elements is only defined up to


multiplication by ±1. We make the convention that in this case, we choose
the gcd of a set of elements to be positive. Similarly, we choose the lcm of
a set of elements to be positive.

i i

i i
i i

i i

2.8. Exercises 83

Exercise 2.14. Let {a1 , . . . , an } be a set of integers, and let r be a rational


number such that ai r is an integer, i = 1, . . . , n.

(a) Suppose that {a1 , . . . , an } is relatively prime. Show that in fact r is an


integer.

(b) More generally, let {a1 , . . . , an } have a gcd of d. Show that dr is an


integer.

Exercise 2.15. Let {a1 /b1 , . . . , an /bn } be a set of fractions in lowest terms.
(By this we mean that ai and bi are relatively prime, for each i.)

(a) Suppose that {b1 , . . . , bn } is pairwise relatively prime. Let = b1 · · · bn ,


and let mi = /bi , i = 1, . . . , n. Let k = a1 m1 + . . . + an mn . Show that
k and are relatively prime. (Note that

(a1 /b1 ) + . . . + (an /bn ) = k/ ,

so this shows that k/ is a fraction in lowest terms.)

(b) Give an example of the following: a set of fractions in lowest terms


{a1 /b1 , . . . , an /bn } with {b1 , . . . , bn } relatively prime (but not pairwise
relatively prime), with = lcm(b1 , . . . , bn ), mi = /bi , i = 1, . . . , n,
k = a1 m1 + . . . + an mn , and k and not relatively prime. (Thus, in
this case k/ is not a fraction in lowest terms.)

Exercise 2.16. In each case, find the gcd of the following set of integers,
and express the gcd as a linear combination of those integers:

(a) {19, 61},

(b) {195, 37},

(c) {391, 833},

(d) {12345, 54321},

(e) {65, 175, 233},

(f) {1591, 1887, 2193}.

Exercise 2.17. Find the lcm of each of the sets of integers in Exercise 2.16.
√ √
Exercise 2.18. Let R = O( −1). Set i = −1. In each case, find a gcd
of the following sets of elements of R, and express that gcd as a linear
combination of those elements:

i i

i i
i i

i i

84 2. Unique Factorization

(a) {13, 75},

(b) {5 + i, 7 + 2i},

(c) {17, 29 + 3i},

(d) {3 + 2i, 2 + i},

(e) {1 + i, 1 − i},

(f) {5 + 5i, 14 + 8i, 9 + 7i}.

Exercise 2.19.

(a) Show that each of the following factorizations is a factorization into



irreducibles in O( −14):
√ √
47 = (11 + 3 −14) · (11 − 3 −14) = 13 · 19,
√ √
1745 = (39 + 4 −14) · (39 − 4 −14) = 5 · 349.

(b) Show that each of the following factorizations is a factorization into



irreducibles in O( −17):
√ √
121 = 11 · 11 = (2 + 3 −17) · (2 − 3 −17),
√ √
474 = (7 + 5 −17) · (7 − 5 −17)
√ √
= 2 · (13 + 2 −17) · (13 − 2 −17).

(c) Show that each of the following factorizations is a factorization into



irreducibles in O( −23):
√ √
89 = 3 · 163 = (11 + 4 −23) · (11 − 4 −23),
√ √
2049 = 3 · 683 = (41 + 4 −23) · (41 − 4 −23).

(d) Show that each of the following factorizations is a factorization into



irreducibles in O( −26):
√ √
1563 = (17 + 7 −26) · (17 − 7 −26) = 3 · 521,
√ √
8449 = (5 + 18 −26) · (5 − 18 −26) = 7 · 17 · 71,
√ √
763 = 7 · 109 = (23 + 3 −26) · (23 − 3 −26),
√ √
136577 = 7 · 109 · 179 = (23 + 3 −26) · (23 − 3 −26) · 179
√ √
= (369 + 4 26) · (369 − 4 26).

i i

i i
i i

i i

2.8. Exercises 85

Exercise 2.20. Show


√ that the following factorizations are factorizations into
irreducibles in O( D) for the appropriate value of D:
√ √
(a) 55 = (9 + 26) · (9 − 26) = 5 · 11,
√ √
(b) −95 = (3 + 2 26) · (3 − 2 26) = −5 · 19,
√ √
(c) −21 = (3 + 30) · (3 − 2 30) = −3 · 7,
√ √
(d) 49 = (13 + 2 30) · (13 − 2 30) = 7 · 7,
√ √
(e) 35 = (7 + 34) · (7 − 34) = 5 · 7,
√ √
(f) −25 = (3 + 34) · (3 − 34) = −5 · 5,
√ √
(g) 14 = (7 + 35) · (7 − 35) = 2 · 7,
√ √
(h) −26 = (17 + 3 35) · (17 − 3 35) = −2 · 13,
√ √
(i) −21 = (7 + 70) · (7 − 70) = −3 · 7,
√ √
(j) −65 = (3 + 74) · (3 − 74) = −5 · 13,
√ √
(k) −77 = (1 + 78) · (1 − 78) = −7 · 11,
√ √
(l) −51 = (6 + 87) · (6 − 87) = −3 · 17.

Exercise 2.21.

(a) Consider the following factorizations in O( −2):
√ √ √ √
51 = 3 · 17 = (1 + 5 −2) · (1 − 5 −2) = (7 + −2)(7 − −2).

These look like three distinct factorizations into irreducibles, which



would contradict O( −2) being a UFD. Show that in fact they are
not factorizations into irreducibles. Find a factorization of 51 into
irreducibles, and show how it yields these three factorizations of 51.

(b) Consider the following factorizations in O( −19):
√ √
35 = 5 · 7 = (4 + −19) · (4 − −19)
√ √
= ((11 + −19)/2) · ((11 − −19)/2).

These look like three distinct factorizations into irreducibles, which



would contradict O( −19) being a UFD. Show that in fact they are
not factorizations into irreducibles. Find a factorization of 35 into
irreducibles, and show how it yields these three factorizations of 35.

i i

i i
i i

i i

86 2. Unique Factorization

(c) Consider the following factorizations in O( 6):
√ √
6 = 6 · 6 = 2 · 3.

These look like√two distinct factorizations into irreducibles, which would


contradict O( 6) being a UFD. Show that in fact they are not factor-
izations into irreducibles. Find a factorization of 6 into irreducibles,
show how it yields these two factorizations of 6, and use it to construct
another factorization of 6.

(d) Consider the following factorizations in O( 19):
√ √
−75 = −(1 + 2 19) · (1 − 2 19) = −3 · 5 · 5.

These look like


√ two distinct factorizations into irreducibles, which would
contradict O( 19) being a UFD. Show that in fact they are not factor-
izations into irreducibles. Find a factorization of −75 into irreducibles,
show how it yields these two factorizations of −75, and use it to construct
another factorization of −75.

Exercise 2.22.

(a) Let p be an odd prime. Set D = −2p. Show that


√ √
2p = − D · D = 2 · p

are two factorizations of 2p into irreducibles in O( D). (This applies
to D = −6, −10, −14, −22, −26, −34, . . . .)

(b) Let p be an odd prime and suppose that D = 1 − 2p is square-free.


Show that √ √
2p = (1 − D)(1 + D) = 2 · p

are two factorizations of 2p into irreducibles in O( D). (This applies
to D = −5, −13, −21, −21, −33, −37, . . . .)

(c) Let p and q be primes with pq ≡ 3 (mod 4) and suppose that D = 1−pq
is square-free. Show that
√ √
pq = (1 − D)(1 + D) = p · q

are two factorizations of pq into irreducibles in O( D). (This applies
to D = −14, −34, −38, −94, −118, −142, . . . .)

i i

i i
i i

i i

2.8. Exercises 87

(d) Let p ≥ 17 and q ≥ 17 be primes with pq ≡ 1 (mod 4) and suppose


that D = (1 − pq)/4 is square-free. Show that
√ √
pq = (1 − D)(1 + D) = p · q

are two factorizations of pq into irreducibles in O( D). (This applies
to D = −109, −123, −157, . . . .)

Exercise 2.23. Verify that Corollary 2.71 is true.

Exercise 2.24. Verify that Corollary 2.74 is true.

Exercise 2.25. Verify that Corollary 2.75 is true. (This requires the use of
a computer.)

Exercise 2.26. Do the analogue of Example 2.83 for p2 = 17, 29.

Exercise 2.27. Do the analogue of Example 2.85 for p2 = 17, 29.

Exercise 2.28.

(a) Check the correctness of Table 2.1.

(b) Extend this table to values of D between 500 and 1000.

Exercise 2.29. Let R be an integral domain. Show that

f (X) = deg(f (X)) for f (X) = 0

is a norm on R[X]. (Here deg denotes the degree of a polynomial.) Note


that the norm is not defined for the 0 polynomial.

Exercise 2.30. With the above definition of  ·  on R[X], show that, for
any two nonzero polynomials f (X) and g(X),

(a) f (X)g(X) = f (X) + g(X);

(b) f (X) + g(X) ≤ max(f (X), g(X)).

(The reason for not defining the norm of the 0 polynomial in R[X] is
to make these equations hold. If we defined the norm of the 0 polynomial
to be 0, as you might think, we would have to make exceptions to (a) and
(b) in order to make them hold.)

i i

i i
i i

i i

88 2. Unique Factorization

Exercise 2.31. Let R be a field. Show that R[X] is a Euclidean domain


with norm  ·  as defined above. As a consequence, we conclude that R[X]
is a PID and hence a UFD.

Exercise 2.32.

(a) Show that Z[X] is not a Euclidean domain with the above norm.

(b) More generally, let R be any integral domain that is not a field. Show
that R[X] is not a Euclidean domain with the above norm.

In the case R = Q[X], the gcd of a set of elements is only defined up


to multiplication by a nonzero rational number. We make the convention
that in this case, we choose the gcd of a set of elements to be monic. A
monic polynomial is one in which the highest power of “X” is equal to 1.
Similarly, we choose the lcm of a set of elements to be monic.

Exercise 2.33. Find the gcd of each of the following sets of polynomials in
Q[X], and express the gcd as a linear combination of those polynomials:

(a) {X 2 + X + 1, X + 1};

(b) {X 3 + X + 1, X + 2};

(c) {X 3 + 2X 2 + X + 2, X 4 + 5X 2 + 4}.

Exercise 2.34. Consider the following integral domain:

R = {f (X) = a0 + a2 X 2 + a3 X 3 + . . . + an X n in Q[X]},

i.e., R consists of those polynomials in Q[X] that do not have an “X” term.

(a) Show that X 2 and X 3 are irreducible but not prime elements of R.

(b) This implies that R is not a UFD. Find an explicit element of R that
has two distinct factorizations into irreducibles. (You may have already
done (b) in doing (a).)

(c) Find a pair of elements of R that does not have a gcd.

Exercise 2.35. Let R = Z[X]. Let

I = {f (X) = an Xn + . . . + a0 | a0 is even},

i.e., I consists of those polynomials whose constant term is even.

i i

i i
i i

i i

2.8. Exercises 89

(a) Show that I is an ideal of R.

(b) (b) Show that I = I{2,X} .

(c) Show that I is not a principal ideal of R.

Thus, we see that Z[X] is not a PID. (It turns out that Z[X] is a UFD,
so Z[X] gives an example of a UFD that is not a PID.)

i i

i i
i i

i i

Chapter 3

The Gaussian Integers


In this chapter we will investigate O( −1), known as the Gaussian inte-
gers. We recall that
√ √
O( −1) = {a + b −1 | a and b are integers}
= {a + bi | a and b are integers},

and we have shown that O( −1) is a UFD. We will begin by proving a
justly famous theorem of Fermat: every prime congruent to 1 (mod 4) can
be written in as a sum of squares of two positive integers, p = x2 + y 2 ,
uniquely up to the order of the summands. (For example, 5 = 22 + 12 ,
13 = 32 + 22 , 17 = 42 + 12 , 29 = 52 + 22 , 37 = 62 + 12 , 41 = 52 + 42 ).
We shall present three proofs of this theorem in the first section of this
chapter.
The first proof, due to Euler, is believed to be Fermat’s original proof.
(Fermat did not write his proof down, but left a hint as to his approach.)
This is the longest and most difficult proof. (Actually, the proof we write
down is a variant of Euler’s proof, looking ahead to Chapter 4.)
The second proof is a twentieth-century proof due to Thue. It is of
medium length and difficulty.
The third proof uses the fact that the Gaussian integers are a UFD,
and, given that fact, is short and easy!

The fact that O( −1) is a UFD gives us unique factorization into
primes, but by itself does not tell us what the primes are (or how to find a
factorization). Using Fermat’s theorem, we can concretely and completely
answer this question, and we do so in the second section of this chapter.

91

i i

i i
i i

i i

92 3. The Gaussian Integers

3.1 Fermat’s Theorem


We wish to represent numbers as sums of two squares. One case is easy to
rule out.
Lemma 3.1. No integer N ≡ 3 (mod 4) can be written as a sum of two
squares.

Proof: First note that if z is even, z = 2k, then z 2 = 4k 2 = 4(k 2 ) ≡


0 (mod 4), while if z is odd, z = 2k + 1, then z 2 = 4k 2 + 4k + 1 =
4(k 2 + k) + 1 ≡ 1 (mod 4).
Now consider x2 +y 2 . If x and y are both even, then x2 ≡ 0 (mod 4) and
y 2 ≡ 0 (mod 4), so x2 + y 2 ≡ 0 + 0 = 0 (mod 4). If x is even and y is odd,
then x2 ≡ 0 (mod 4) and y 2 ≡ 1 (mod 4), so x2 + y 2 ≡ 0 + 1 = 1 (mod 4).
Similarly, if x is odd and y is even, x2 + y 2 ≡ 1 (mod 4). Finally, if x and
y are both odd, then x2 ≡ 1 (mod 4) and y 2 ≡ 1 (mod 4), so x2 + y 2 ≡
1 + 1 = 2 (mod 4). Thus, in no case can we have x2 + y 2 ≡ 3 (mod 4). 

Now for the numbers we wish to rule in. We begin with an observation:
if each of two numbers is representable as a sum of two squares, so is their
product. For example, since 5 = 22 + 12 and 13 = 32 + 22 , we can conclude
that 65 is also a sum of two squares, and in fact 65 = 82 + 12 = 72 + 42 .
That this is true is a simple algebraic fact.
Lemma 3.2. If m = a2 + b2 and n = c2 + d2 , then

mn = e2 + f 2

where e = ac − bd and f = ad + bc.

Proof: We simply compute

e2 + f 2 = (ac − bd)2 + (ad + bc)2


= (a2 c2 − 2acbd + b2 d2 ) + (a2 d2 + 2adbc + b2 c2 )
= a2 c2 + b2 d2 + a2 d2 + b2 c2
= (a2 + b2 )(c2 + d2 ),

proving the result. 

It is convenient to introduce the following language.


Definition 3.3. The representation of mn as a sum of squares,

mn = e2 + f 2

i i

i i
i i

i i

3.1. Fermat’s Theorem 93

with e = ac − bd and f = ad + bc is obtained from the representations


m = a2 + b2 and n = c2 + d2 of m and n as sums of squares by composition.

Fermat’s proof contained several brilliant ideas. The first was to turn
this lemma around and to show that if m and mn are sums of two squares,
then, under proper conditions, so is n. The second was not to try to show
directly that n is a sum of two squares, but instead to find some multiple
mn of n that is, and to apply the first idea. Of course, to make that work
Fermat had to know that m is a sum of two squares and for that he applied
his “method of descent,” a variant of mathematical induction.
Now let us look at Fermat’s proof precisely. We will break it up into a
number of steps.

Lemma 3.4.

(1) If p is a prime congruent to 1 (mod 4) then the congruence x2 + y 2 ≡


0 (mod p) has a solution other than x ≡ y ≡ 0 (mod p).

(2) If p is a prime congruent to 3 (mod 4) then the congruence x2 + y 2 ≡


0 (mod p) only has the solution x ≡ y ≡ 0 (mod p).

Proof:

(1) In this case we know, by Corollary B.33, that −1 is a quadratic residue


(mod p), i.e., there is an integer a with a2 ≡ −1 (mod p). Then, setting
x = a and y = 1,

x2 + y 2 = a2 + 12 ≡ −1 + 1 ≡ 0 (mod p).

(2) We prove this by contradiction. Suppose x2 + y 2 ≡ 0 (mod p) with


x ≡ 0 (mod p). Then, by Theorem B.11, there is an integer a with
ax ≡ 1 (mod p). Then (ax)2 ≡ 12 ≡ 1 (mod p), so we have

x2 + y 2 ≡ 0 (mod p)
a (x + y ) ≡ 0
2 2 2
(mod p)
(ax) + (ay) ≡ 0
2 2
(mod p)
1 + (ay) ≡ 0
2
(mod p)
(ay) ≡ −1
2
(mod p),

which would show −1 is a quadratic residue (mod p). But, by Corol-


lary B.30, we know that is not the case. 

i i

i i
i i

i i

94 3. The Gaussian Integers

Lemma 3.5. (Fermat). Suppose N = s2 + t2 is a sum of two squares, and


suppose p is a prime divisor of N that has a representation p = a2 + b2
as a sum of two squares. Then M = N/p has a representation M =
c2 + d2 such that the given representation N = s2 + t2 is obtained from the
representations p = a2 + b2 and M = c2 + d2 by composition or from the
representations p = a2 + (−b)2 and M = c2 + d2 by composition.

Proof: Let us begin by composing the representations N = s2 + t2 and


p = a2 + (−b)2 , to obtain the representation

N p = (sa + tb)2 + (−sb + ta)2

and by composing the representations N = s2 + t2 and p = a2 + b2 , to


obtain the representation

N p = (sa − tb)2 + (sb + ta)2 .

Now p divides N , by assumption, so p divides N b2 . Certainly, p divides


t2 p. Hence p divides the difference t2 p − N b2 . But

t2 p − N b2 = t2 (a2 + b2 ) − (s2 + t2 )b2 = −s2 b2 + t2 a2 = (−sb + ta)(sb + ta).

Since p is a prime, it must divide one of the factors.

Case (I): p divides the first factor −sb + ta. Write −sb + ta = pd and
consider the first representation of N p,

N p = (sa + tb)2 + (−sb + ta)2 .

Now p divides the left-hand side and the last term on the
right-hand side, so it must divide the first term as well, so we
may write sa + tb = pc. Then

N p = M p2 = (pc)2 + (pd)2 = p2 c2 + p2 d2 ,

so
M = c2 + d2

is a sum of two squares. Furthermore, we have just seen that

sa + tb = pc
−sb + ta = pd.

i i

i i
i i

i i

3.1. Fermat’s Theorem 95

Regard this as a system of two linear equations in the two


unknowns s and t and solve. Doing so we obtain

s = ac − bd
t = ad + bc

and, referring to Definition 3.3, we see that N = s2 + t2 is


obtained from M = c2 + d2 and p = a2 + b2 by composition.

Case (II): p divides the second factor sb + ta. Write sb + ta = pd and


consider the second representation of N p,

N p = (sa − tb)2 + (sb + ta)2 .

Now p divides the left-hand side and the last term on the
right-hand side, so it must divide the first term as well, so we
may write sa − tb = pc. Then once again

N p = M p2 = (pc)2 + (pd)2 = p2 c2 + p2 d2 ,

so
M = c2 + d2

is a sum of two squares. Similarly, we have just seen that

sa − tb = pc
sb + ta = pd.

Regard this as a system of two linear equations in the two


unknowns s and t and solve. Doing so we obtain

s = ac + bd
t = ad − bc

and, referring to Definition 3.3, we see that N = s2 + t2 is ob-


tained from M = c2 + d2 and p = a2 + (−b)2 by composition.

Theorem 3.6. (Fermat). Every prime p ≡ 1 (mod 4) can be as a sum of


squares of positive integers, p = x2 + y 2 , and this representation is unique
up to the order of x and y.

i i

i i
i i

i i

96 3. The Gaussian Integers

Proof (Fermat/Euler): By Lemma 3.4, we can find integers x and y with


x2 + y 2 ≡ 0 (mod p) and with x and y not divisible by p. Since this is
a congruence, we may assume 1 ≤ x ≤ p − 1 and 1 ≤ y ≤ p − 1. Since
((p−1)−x)2 ≡ x2 (mod p) we may, replacing x by p−x if necessary, assume
that 1 ≤ x ≤ (p−1)/2, and similarly we may assume that 1 ≤ y ≤ (p−1)/2.
Finally, if x and y have a common factor (which is certainly not p), we may
divide x and y by that common factor, to obtain a pair of relatively prime
integers x and y with 1 ≤ x ≤ (p − 1)/2, 1 ≤ y ≤ (p − 1)/2, and x2 + y 2 a
multiple of p. So we may assume x and y are relatively prime as well. Let
N = x2 + y 2 and M = N/p. Then M is an integer and N = M p. Observe
that, since 0 < x < p/2 and 0 < y < p/2,

N = x2 + y 2 < (p/2)2 + (p/2)2 = p2 /2,

so M < p/2.
In particular, every prime factor of M is less than p/2. Let q be any
prime factor of M . We claim q ≡ 3 (mod 4). For suppose q ≡ 3 (mod 4).
Then N = x2 + y 2 and N is a multiple of q, so x2 + y 2 ≡ 0 (mod q) and
hence, by Lemma 3.4, x ≡ y ≡ 0 (mod q). In other words, x and y are each
divisible by q, which contradicts our assumption that x and y are relatively
prime.
Thus we see that, for any prime factor q of N , q = 2 or q ≡ 1 (mod 4).
Also, let us observe that we may write q = 2 as a sum of two squares,
2 = 12 + 1 2 .
With this in hand, we prove the theorem by induction. Assume that
every prime q ≡ 1 (mod 4), q < p, can be written as a sum of two squares,
and consider p. We have written N = M p as a sum of two squares, N =
x2 + y 2 . But
N = (2 · 2 . . . · 2 · q1 · q2 . . . · qk )p
where we have a certain number of factors of 2 (perhaps none) and a certain
number of odd prime factors q1 , q2 , . . ., qk (perhaps none), with q1 < p,
q2 < p, . . ., qk < p. But now, since each of these factors is a sum of two
squares, we may apply Lemma 3.5 repeatedly, eliminating one factor of M
each time, until finally we obtain a representation of p as a sum of two
squares, as claimed.
Now we must show the uniqueness part of the theorem.
Suppose p = a2 +b2 = s2 +t2 . Apply Lemma 3.5 with N = p to conclude
that p = s2 + t2 is obtained from p = a2 + b2 and a representation of
M = p/p = 1 by composition, or from p = a2 + (−b)2 and a representation
of M = p/p = 1 by composition. But obviously the only representations

i i

i i
i i

i i

3.1. Fermat’s Theorem 97

of 1 are 1 = 12 + 02 = (−1)2 + 02 = 02 + 12 = 02 + (−1)2 . Composing


a2 + b2 and a2 + (−b)2 with these yields the eight possibilities p = a2 + b2 =
a2 + (−b)2 = (−a)2 + b2 = (−a)2 + (−b)2 = b2 + a2 = b2 + (−a)2 =
(−b)2 + a2 = (−b)2 + (−a)2 , and we see that in exactly two of these cases
we have p = x2 + y 2 with x and y positive, and that these two cases differ
from each other only in that x and y are interchanged. 

Remark 3.7. We have finished the proof by induction. Fermat phrased it


“by descent,” which is equivalent to induction. Fermat’s phraseology is as
follows: Suppose there is a prime p ≡ 1 (mod 4) that cannot be written as
a sum of two squares. In the notation of the proof, writing N as a sum of
two squares, we see that there must be some smaller prime q that cannot
be written as a sum of two squares, with q = 2 or q ≡ 1 (mod 4), for all of
the factors of M are of this form, and if each could be written as a sum of
two squares, so could p, by Lemma 3.5. Now apply the same analysis to q
instead of M to get a smaller prime q  that cannot be written as a sum of
two squares. Continue in this fashion. Doing so, we obtain a descending
sequence p > q > q  > q  > . . . of primes, each of which is either 2 or
congruent to 1 (mod 4), and none of which can be written as a sum of two
squares. This sequence must terminate at the smallest such prime, which
is 2, and that is a contradiction, as 2 = 12 + 12 is a sum of two squares.

We now present our second proof, and begin with a key lemma.

Lemma 3.8. (Thue). Let p be a prime and let a be any integer relatively
prime to p. Then there are integers x0 and y0 with ax0 ≡ y0 (mod p) and
√ √
0 < |x0 | < p, 0 < |y0 | < p.

Proof: Let k = [ p], for convenience, and consider

S = {ax − y | 0 ≤ x ≤ k, 0 ≤ y ≤ k}.

There are k + 1 choices for x and k + 1 choices for y, so the set S has
(k + 1)2 > p elements. Since there are only p congruence classes (mod p),
by the Pigeonhole Principle there must be two different elements of S that
are congruent (mod p),

ax1 − y1 ≡ ax2 − y2 (mod p),

and so
a(x1 − x2 ) ≡ y1 − y2 (mod p).

i i

i i
i i

i i

98 3. The Gaussian Integers

Set x0 = x1 −x2 and y0 = y1 −y2 . Then certainly ax0 ≡ y0 (mod p). We


need to check that the other conditions on x0 and y0 are satisfied. First,
since 0 ≤ x1 ≤ k and 0 ≤ x2 ≤ k, the largest x0 can be is k − 0 = k and the
√ √
smallest x0 can be is 0 − k = −k, so |x0 | < p, and similarly |y0 | < p.
Also, since a is relatively prime to p, either x0 and y0 are both zero or
both nonzero. But they cannot both be zero, as then x1 = x2 and y1 = y2 ,
contradicting our choice of different elements of S. 

Second Proof of Fermat’s Theorem (Thue): Let p ≡ 1 (mod 4) and choose


an integer a with a2 ≡ −1 (mod p). (Such an integer a exists by Corol-
lary B.33).
Let ax0 ≡ y0 (mod p) with x0 and y0 as in Lemma 3.8. Then

a2 x20 ≡ y02 (mod p)


−x20 ≡ y02 (mod p)
x20 + y02 ≡0 (mod p),
√ √
i.e., x20 + y02 is a multiple of p. But 0 < |x0 | < p and 0 < |y0 | < p, so
0 < x20 + y02 < 2p. Hence we must have x20 + y02 = p.
Now we must show uniqueness. Suppose p = a2 + b2 = c2 + d2 with
a, b, c, d ≥ 0. Note that we must have a and b relatively prime, as if they
had a common factor r the prime p would be divisible by r2 . Similarly, c
and d are relatively prime. Then, similarly to the proof of Lemma 3.5, we
may apply composition and take absolute values to get two representations
of p2 :

p2 = |ac + bd|2 + |ad − bc|2


= |ac − bd|2 + |ad + bc|2 .

Then (continuing as in the proof of Lemma 3.5) p divides

pd2 − b2 p = (a2 + b2 )d2 − b2 (c2 + d2 ) = a2 d2 − b2 c2 = (ad − bc)(ad + bc),

so p must divide one of the two factors. Suppose p divides ad − bc and


consider the representation p2 = |ac + bd|2 + |ad − bc|2 . (If p divides
ad + bc, consider the other representation and argue analogously.) Then p
divides the left-hand side and the last term on the right-hand side, so must
divide the first term on the right-hand side as well. Write ac + bd = pu
and ad − bc = pv. Then p2 = (pu)2 + (pv)2 = p2 (u2 + v 2 ), i.e., u2 + v 2 = 1.
Since ac + bd > 0, u > 0, so we must have u = 1 and v = 0, i.e., ac + bd = p
and ad − bc = 0.

i i

i i
i i

i i

3.1. Fermat’s Theorem 99

In particular, ad = bc. Then a divides the product bc. But and a and b
are relatively prime, so by Euclid’s Lemma a divides c. Similarly, c divides
the product ad, and c and d are relatively prime, so by Euclid’s Lemma c
divides a. Hence a and c divide each other so a = c, and then b = d as
well. (In the other case we get u = 0 and v = 1, which yields the other
order a = d and b = c.) 

Now we present a lightning-fast proof of both existence and uniqueness



using the fact that the Gaussian integers O( −1) is a UFD.

Third Proof of Fermat’s Theorem:


Existence: Since p ≡ 1 (mod 4), −1 is a quadratic residue (mod p), by
Corollary B.33. Thus, there is an integer x with x2 ≡ −1 (mod p) and
then N = x2 + 1 is divisible by p, N = M p for some integer M . Note

x2 + 1 = (x + i)(x − i) in O( −1). Consider the two factorizations of N ,

N = (x + i)(x − i) = M p.

Now p divides N but it certainly does not divide either x + i or x − i



(as (x + i)/p and (x − i)/p are not in O( −1)), so p is not a prime in
√ √
O( −1). Now O( −1) is a UFD (Corollary 2.53), and in a UFD primes
are irreducibles and vice versa (Lemma 2.55), so p is not irreducible in
√ √
O( −1). Hence p has a factorization p = αβ in O( −1) with neither α
nor β a unit. Then p2 = p = α · β so α = β = p (as otherwise
α = 1 and α is a unit, or β = 1, and β is a unit). Write

α = a + bi.

Then
p = α = αα = (a + bi)(a − bi) = a2 + b2 ,
and so p is written as a sum of squares, as claimed. (Also, we find that
β = α = a − bi.)
Uniqueness: Suppose p = a2 + b2 = c2 + d2 . Then

p = (a + bi)(a − bi) = (c + di)(c − di).

Now each of these four factors has norm p, and p is an ordinary prime, so
each of these factors is irreducible (Lemma 2.64). In a UFD, primes are
irreducible and vice versa (Lemma 2.55), so each of these factors is prime.

Thus, since a + b −1 is prime and divides the product, it must divide one
of the factors on the right-hand side.

i i

i i
i i

i i

100 3. The Gaussian Integers

In general, if α divides β, β = αγ, then β = α · γ, and here α and
β each have norm p, so γ = 1, which implies γ is a unit (Lemma 1.14).

But by Corollary 1.15 we already know all the units in O( −1). They are
{±1, ±i}. Thus, we have the possibilities

c + di = (a + bi)γ with γ = 1, −1, i, or − i

or
c − di = (a + bi)γ with γ = 1, −1, i, or − i.

Solving for c and d, we see that these give the possibilities (c, d) = (±a, ±b)
or (±b, ±a), where the signs can be chosen independently, so up to order
and requiring both entries positive there is a unique solution. 

Remark 3.9. Combining the easy Lemma 3.1 and the trivial observation
2 = 12 + 12 with Theorem 3.6, we see that Fermat’s Theorem can be
rephrased as follows:
Theorem (Fermat). Let p be a prime. Then p can be written as a sum
of squares of integers, p = x2 + y 2 for some integers x and y, if and only
if p = 2 or p ≡ 1 (mod 4), in which case x and y are essentially unique.
Here by “essentially unique” we mean unique up to sign and up to
interchanging the order of x and y.
(We mention this not because it adds anything to what we have already
proved but rather for comparison with the exercises for this chapter.)

For our purposes, Theorem 3.6 is all we need. But not only did Fermat
show which primes could be represented as sums of two squares, he also
showed which integers could be represented as sums of squares. Since this
is also an interesting result, and since the rest of the work is relatively easy,
we shall finish this section by showing that.

Theorem 3.10. (Fermat). A positive integer n can be written as the sum


of two squares if and only if for every prime q ≡ 3 (mod 4) that divides n,
the highest power of q dividing n is even.

Proof: If n = 1 then n = 12 + 02 and we are done. For n > 1 let us factor


n into primes,
n = 2a pb11 . . . pbkk q1c1 · · · qc

where each prime pi is congruent to 1 (mod 4) and each prime qi is congru-


ent to 3 (mod 4). Set m = 2a pbi 1 · · · pbkk , and m = q1c1 · · · qc , so n = mm .

i i

i i
i i

i i

3.2. Factorization into Primes 101

Then 2 = 12 + 12 , and by Fermat’s Theorem each pi is a sum of two


squares, so we may repeatedly compose representations (i.e., repeatedly
apply Lemma 3.2) to obtain a representation m = u2 + v 2 . If each ci is
even, let
c /2 c /2
z = q11 · · · q 

and note that z is an integer with z 2 = m . Then, setting x = uz and


y = vz,
x2 + y 2 = (uz)2 + (vz)2 = (u2 + v 2 )z 2 = mm = n,

so n is a sum of two squares.


On the other hand, suppose n = x2 + y 2 is a sum of two squares, and let
q be any prime congruent to 3 (mod 4) that divides n. Then n is a multiple
of q, so x2 + y 2 = n gives x2 + y 2 ≡ 0 (mod q) and, by Lemma 3.4, that
implies that x and y are each multiples of q. Let s and t be the highest
powers of q that divide x and y respectively, so x = q s u and y = q t v with
neither u nor v divisible by q. Let us assume s ≤ t (otherwise, simply
interchange x and y). Then

n = x2 + y 2 = (q s u)2 + (q t v)2 = q 2s (u2 + q 2(t−s) v 2 ).

There are now two possibilities: If s < t, then u2 is not divisible by q but
2(t−s) 2
q v is divisible by q, so their sum is not divisible by q. If s = t, then
u and v are not divisible by q, and q ≡ 3 (mod 4), so again, by Lemma 3.4,
u2 + v 2 is not divisible by q. Thus, in either case the highest power of q
dividing n is 2s, which is even, as claimed. 

3.2 Factorization into Primes


We have two goals in this section. Our first is to find the primes in the
Gaussian integers, and our second is to show how to factor an arbitrary
Gaussian integer into primes. Recall from Definition 1.11 that if α = a + bi
is a Gaussian integer, its conjugate is α = a − bi. (Note that α = α if and
only if α = a is an ordinary integer.) We also observe that if α and β are
Gaussian integers such that α divides β, then α divides β. This is a direct
computation: if β = αγ with α = a + bi, β = c + di, and γ = e + f i, so
that (c + di) = (a + bi)(e + f i), then also (c − di) = (a − bi)(e − f i), so
β = αγ. In particular, if β = b is an ordinary integer, then α divides β if
and only if α divides β.

i i

i i
i i

i i

102 3. The Gaussian Integers



Theorem 3.11. The following is a complete list of primes in O( −1):

(1) 1 + i and its associates −1 + i, −1 − i, and 1 − i;

(2) for any ordinary prime p ≡ 1 (mod 4), write p as p = a2 + b2 with a


and b integers;

(a) a + bi and its associates − b + ai, −a − bi, and b − ai;

(b) a − bi and its associates b + ai, −a + bi, and − b − ai.

(3) for any ordinary prime p ≡ 3 (mod 4), p and its associates pi, −p, and
−pi.

Remark 3.12.

(1) Recall that the associates of α are αε for any unit ε of O( −1). Since

we determined in Corollary 1.15 that the units of O( −1) are ±1 and
±i, we have just multiplied the first prime in each list by these units
to obtain the others.

(2) Note that 1 + i and its conjugate 1 − i are associates, but if a and b
are as in Theorem 3.11(2) then a + bi and its conjugate a − bi are not
associates.

Proof: In this theorem we are making two claims: first, that every Gaussian
integer on this list is a prime, and second, that every prime in the Gaussian
integers is on this list. Both of these claims are consequences of our earlier
work.
If α = 1 + i then α = 2 and if α = a + bi as in (2a) or α = a − bi
as in (2b) then α = p. Thus, in either case α is an ordinary prime, so

by Lemma 2.64(1) α is irreducible. If p ≡ 3 (mod 4), then O( −1) has no
element β of norm p (as if β = x + yi, p = β = x2 + y 2 , and we know
from Lemma 3.1 that this is impossible). Then, setting γ = p, we have

γ = p2 , so γ is irreducible by Lemma 2.64(2). Finally, since O( −1) is
a UFD, the primes and the irreducibles are the same (Lemma 2.55), so we
have proved our first claim.
Now suppose α is prime, or equivalently, irreducible. Let α = a + bi.
Then α = a − bi is also irreducible, or equivalently prime. (If we had
α = βγ then we would have α = βγ). Let

N = α = αα = a2 + b2 .

i i

i i
i i

i i

3.2. Factorization into Primes 103

Factor N into ordinary primes, N = pe11 p2 · · · pekk . Since α is a prime,



and it divides the product pe11 · · · pekk , in O( −1), it must divide one of
the factors, so suppose it divides p1 . Write p1 = αβ. Then α divides
p1  = p21 , so we must have α = 1, p1 , or p21 . We cannot have α = 1,
as that would mean α is a unit.
Suppose α = p1 . Now α = a2 + b2 , so that shows p1 = a2 + b2 .
Then p1 must either be 2 or a prime congruent to 1 (mod 4), by Lemma 3.1,
so α is either as in (1), (2a), or (2b). (Furthermore, in this case β = α and
we have p1 = αα.)
Suppose α = p21 . Then β = 1, and so β is a unit, and then α and
p1 are associates, α = εp1 with ε a unit. Since, by assumption, α is a prime
√ √
in O( −1), p1 must also be a prime in O( −1). We have just ruled out
p1 = 2 or p1 ≡ 1 (mod 4), as we have just factored them as αα in those
cases (so they are not irreducible and hence not prime), so we must have
p1 ≡ 3 (mod 4), and α as is in (3). 
Example 3.13. We now show how to factor Gaussian integers into primes.
There is a certain amount of trial and error involved, just as for factoring
ordinary integers into primes. But we will see that there are only finitely
many possibilities to try, so our method will eventually succeed.
We begin with some special cases.
Case 1 ( a). α = 2. Then α = −i(1 + i)2 . (Note −i is a unit.)
Case 1 ( b). α = p, a prime congruent to 1 (mod 4). By trial and error, we
find ordinary positive integers a and b with a2 + b2 = p, and
then
α = (a + bi)(a − bi).
(Note that there are only finitely many possibilities to check

as a is a positive integer with a < p.) For example, 5 =

22 + 12 so 5 = (2 + i)(2 − i) in O( −1), and 13 = 22 + 32 so

13 = (2 + 3i)(2 − 3i) in O( −1).
Case 1 ( c). α = p, a prime congruent to 3 (mod 4). Then α is prime, so
its prime factorization is simply α = p. For example, 7 = 7

and 11 = 11 in O( −1).
Case 2. α is an ordinary integer. Then factor α into ordinary primes and
use case 1. For example, let α = 50. Then
α = 2 · 52 = ((−i)(1 + i)2 )((2 + i)(2 − i))2
= −i(1 + i)2 (2 + i)2 (2 − i)2 .

i i

i i
i i

i i

104 3. The Gaussian Integers

As a second example, let α = 44. Then

α = 22 · 11 = ((−i)(1 + i)2 )2 (11)


= −1(1 + i)4 (11).

As a third example, let α = 405. Then

α = 5 · 7 · 13 = (2 + i)(2 − i)(7)(2 + 3i)(2 − 3i).

Case 3. α is an arbitrary Gaussian integer, α = a+bi. Let g be the ordinary


gcd of the ordinary integers a and b, so a = gc and b = gd with c
and d relatively prime. Then α = gc + gdi = g(c + di) = gβ where
β = c + di. We know how to factor g from Case 2, so we need
only see how to factor β. To do this, we let N = β = c2 + d2 .
Then if γ is a prime dividing β, then γ divides β = N . We
know that N is not divisible by any prime p ≡ 3 (mod 4). Also, if
p is a prime ≡ 1 (mod 4) that divides N , then either γ = a + bi
or γ = a − bi divides N , but not both. (We have shown that γ
and γ are both primes, and that they are not associates. Thus, γ
and γ are relatively prime. If γ and γ both divided β, then their
product p = γγ would divide β, i.e., p would divide c + di, which
contradicts the fact that c and d are relatively prime.) Which of
γ and γ works can only be determined by trial and error. But if
γ, say, works, and e is the highest power of p dividing β, then
β is divisible by γ e .

Let us look at some examples:

(1) α = 2 + 5i. Then β = α and β = 29 is prime, so β is prime and the


prime factorization of α is α = 2 + 5i.

(2) α = 4 + 3i. Then β = α and β = 25 = 52 . Recall that 5 = 22 + 12 =


(2 + i)(2 − i), so β is divisible by either (2 + i)2 or (2 − i)2 . The second
of these works and we see α has the prime factorization α = i(2 − i)2 .

(3) α = 21 − 12i. Then β = 7 − 4i and β = 65 = 5 · 13. Now 5 =


(2 + i)(2 − i) and 13 = (2 + 3i)(2 − 3i). Trial and error shows that α
has the prime factorization α = 3(2 + i)(2 − 3i).
Note that our factorizations are only unique up to units, so there are
many variants. For example, we also have the factorizations 2 = (1 +
i)(1 − i), 5 = (1 + 2i)(1 − 2i), 13 = −i(2 + 3i)(3 + 2i), etc.

i i

i i
i i

i i

3.3. Exercises 105

Remark 3.14. Let us now shift our point of view from factoring in the
Gaussian integers and ask “what happens” to ordinary primes when we go
from the ordinary integers to the Gaussian integers. Looking at Theorem
2.1, we can see three sorts of behavior:

(1) The ordinary prime 2 is (up to a unit) the square of a prime in O( −1):

2 = −i(1 + i)2 . The ordinary prime 2 is said to ramify in O( −1).

(2) An ordinary prime p ≡ 1 (mod 4) is (up to a unit) a product of two



distinct (i.e., nonassociated) primes in O( −1): 5 = (2 + i)(2 − i),
13 = (2 + 3i)(2 − 3i), etc. These ordinary primes are said to split in

O( −1).

(3) An ordinary prime p ≡ 3 (mod 4) is (up to a unit) still a prime in



O( −1): 7 = 7, 11 = 11, etc. These ordinary primes are said to be

inert in O( −1).

These three kinds of behavior are typical of what happens in general.

3.3 Exercises
Compare Exercises 3.1–3.5 with Remark 3.9, and Exercises 3.6–3.10 with
Theorem 3.10. In Exercises 3.11, 3.12, and 3.13 “essentially” means up to
sign and order and in the remaining exercises “essentially” means up to
sign.

Exercise 3.1. Use the fact that O( −2) is a UFD and the fact that, for
a prime p = 2, −2 is a quadratic residue (mod p) if and only if p ≡ 1 or
3 (mod 8), as shown in Corollary B.37, to prove the following:

Let p be a prime. Then p can be written as p = x2 + 2y 2 for


some integers x and y if and only if p = 2 or p ≡ 1 or 3 (mod 8),
in which case x and y are essentially unique.

Exercise 3.2.

(a) Use the fact that O( −3) is a UFD and the fact that for an odd prime
p = 3, −3 is a quadratic residue (mod p) if and only if p ≡ 1 (mod 3),
by Corollary B.41, a corollary of the Law of Quadratic Reciprocity
(Theorem B.40), to prove the following:

i i

i i
i i

i i

106 3. The Gaussian Integers

Let p be a prime. Then 4p can be written as 4p = x2 + 3y 2


for some integers x and y if and only if p is odd and p = 3 or
p ≡ 1 (mod 3), in which case x and y are essentially unique.
√ √
(b) Let ω = (−1 + −3)/2. Observe that if α = (a + b −3)/2 for odd

integers a and b, then either ωα or ωα is of the form c + d −3 with
c and d integers. Use this to improve the result of part (a) to the
following:

Let p be a prime. Then p can be written as p = x2 + 3y 2 for


some integers x and y if and only if p = 3 or p ≡ 1 (mod 3),
in which case x and y are essentially unique.

Exercise 3.3.

(a) Use the fact that O( −7) is a UFD and the fact that for an odd
prime p = 7, −7 is a quadratic residue (mod p) if and only if p ≡ 1, 2,
or 4 (mod 7), by Corollary B.41, a corollary of the Law of Quadratic
Reciprocity (Theorem B.40), to prove the following:

Let p be a prime. Then 4p can be written as 4p = x2 + 7y 2


for some integers x and y if and only if p = 2 or p ≡ 1, 2, or
4 (mod 7), in which case x and y are essentially unique.

(b) Observe that if x and y are both odd integers, then x2 + 7y 2 ≡


0 (mod 8), while if p is an odd prime then 4p ≡ 4 (mod 8). Use this to
improve the result of part (a) to the following:

Let p be a prime. Then p can be written as p = x2 + 7y 2 for


some integers x and y if and only if p is odd and p = 7 or
p ≡ 1, 2, or 4 (mod 7), in which case x and y are essentially
unique.

Exercise 3.4.

(a) Use the fact that O( 2) is a UFD and the fact that, for a prime p = 2,
2 is a quadratic residue (mod p) if and only if p ≡ 1 or 7 (mod 8), as
shown in Corollary B.37, to prove the following:

Let p be a prime. Then p can be written as p = x2 − 2y 2 for


some integers x and y or −p can be written as −p = x2 − 2y 2
for some integers x and y if and only if p = 2 or p ≡ 1 or
7 (mod 8).

i i

i i
i i

i i

3.3. Exercises 107



(b) Let ε = 1 + 2. Observe that εε = −1. Use this to improve the result
of part (a) to the following:

Let p be a prime. Then p can be written as p = x2 − 2y 2 for


some integers x and y and −p can be written as −p = x2 −2y 2
for some integers x and y if and only if p = 2 or p ≡ 1 or
7 (mod 8).

Exercise 3.5.

(a) Use the fact that O( 5) is a UFD, the fact that 2 is not a quadratic
residue (mod 5), and the fact that for an odd prime p = 5, 5 is a
quadratic residue (mod p) if and only if p ≡ 1 or 4 (mod 5), by the Law
of Quadratic Reciprocity (Theorem B.40), to prove the following:

Let p be a prime. Then 4p can be written as 4p = x2 − 5y 2


for some integers x and y or −4p can be written as −4p =
x2 − 5y 2 for some integers x and y if and only if p is odd and
p = 5 or p ≡ 1 or 4 (mod 5).
√ √
(b) Let ε1 = (3 + 5)/2 and let ε2 = 2 + 5. Observe that ε1 ε1 = 1
and that ε2 ε2 = −1. Use this to improve the result of part (a) to the
following:

Let p be a prime. Then p can be written as p = x2 − 5y 2 for


some integers x and y and −p can be written as −p = x2 −5y 2
for some integers x and y if and only if p is odd and p = 5
or p ≡ 1 or 4 (mod 5).

Exercise 3.6. Prove the following:

A positive integer n can be written as n = x2 + 2y 2 for some


integers x and y if and only if for every prime q ≡ 5 or 7 (mod 8)
that divides n, the highest power of q dividing n is even.

Exercise 3.7. Prove the following:

A positive integer n can be written as n = x2 + 3y 2 for some


integers x and y if and only if for every prime q ≡ 2 (mod 3)
that divides n, the highest power of q dividing n is even.

i i

i i
i i

i i

108 3. The Gaussian Integers

Exercise 3.8. Prove the following:

A positive integer n can be written as n = x2 + 7y 2 for some


integers x and y if and only if for every prime q with either
q = 2 or q odd and q ≡ 3, 5, or 6 (mod 7) that divides n, the
highest power of q dividing n is even.

Exercise 3.9. Prove the following:

A nonzero integer n can be written as n = x2 − 2y 2 for some


integers x and y if and only if for every prime q ≡ 3 or 5 (mod 8)
that divides n, the highest power of q dividing n is even.

Exercise 3.10. Prove the following:

A nonzero integer n can be written as n = x2 − 5y 2 for some


integers x and y if and only if for every prime q ≡ 2 or 3 (mod 5)
that divides n, the highest power of q dividing n is even.

Exercise 3.11. Let n be a positive integer. Show that the number of es-
sentially different ways of writing n as a sum of squares of two integers is
equal to the number of essentially different ways of writing 2n as a sum of
squares of two integers.

Exercise 3.12. Let p be a prime with p ≡ 1 (mod 4).

(a) For a nonnegative integer k, show that pk can be written essentially


uniquely as a sum of squares of two relatively prime integers.

(b) For a nonnegative integer k, show that p2k and p2k+1 can each be
written as a sum of squares of two integers in k + 1 essentially different
ways.

Exercise 3.13.

(a) Let p1 , . . . , pk be distinct primes, all of which are congruent to 1 modulo


4, and let n be the product n = p1 · · · pk . Show that n can be written
as a sum of squares of two integers in 2k−1 essentially distinct ways.

(b) Let p1 and p2 be distinct primes, both of which are congruent to 1


modulo 4, let k be a nonnegative integer, and let n = p1 pk2 . Show that
n can be written as a sum of squares of two integers in k + 1 essentially
distinct ways.

i i

i i
i i

i i

3.3. Exercises 109

Exercise 3.14. Let n be a positive integer. Show that the number of essen-
tially different ways of writing n in the form x2 + 2y 2 for integers x and
y is equal to the number of essentially different ways of writing 2n in the
form x2 + 2y 2 for integers x and y.

Exercise 3.15. Let p be a prime with p ≡ 1 or 3 (mod 8).

(a) For a nonnegative integer k, show that pk can be written essentially


uniquely in the form x2 + 2y 2 for relatively prime integers x and y.

(b) For a nonnegative integer k, show that p2k and p2k+1 can each be
written in the form x2 + 2y 2 for integers x and y in k + 1 essentially
different ways.

Exercise 3.16.

(a) Let p1 , . . . , pk be distinct primes, all of which are congruent to 1 or 3


modulo 4, and let n be the product n = p1 · · · pk . Show that n can be
written in the form x2 + 2y 2 for integers x and y in 2k−1 essentially
distinct ways.

(b) Let p1 and p2 be distinct primes, both of which are congruent to 1 or


3 modulo 8, let k be a nonnegative integer, and let n = p1 pk2 . Show
that n can be written in the form x2 + 2y 2 for integers x and y in k + 1
essentially distinct ways.

Clearly, Exercises 3.14, 3.15, and 3.16 are the analogs for D = −2 of
Exercises 3.11, 3.12, and 3.13 for D = −1, and clearly, there are analogs
of these exercises for D = −3 and D = −7. There are no such analogs
for D = 5, as in this case there are infinitely many representations (a
consequence of our study of Pell’s equation in Chapter 4).

i i

i i
i i

i i

Chapter 4

Pell’s Equation

In Remark 1.17 we saw that we had to consider Pell’s equation

a2 − b2 D = 1,

when D is a square-free positive integer, D = 1.


In fact, we will consider this equation for any positive integer D that is
not a square (since it requires no additional work) and show that it always
has infinitely many solutions in integers a and b.
This was known to Fermat, who also developed a method for finding
solutions. For example, Fermat knew that the smallest solution to a2 −
61b2 = 1 in positive integers is

a = 1766319049, b = 226153980.

Pell’s equation has a long history. A method for solving it, called the
cakravala method, was developed by Indian mathematicians in the seventh
to twelfth centuries. Fermat (who was certainly unaware of this work) was
the first western mathematician both to find a method and to prove that
it always works. The full theory of Pell’s equation is due to Lagrange,
using the method of continued fractions. There is also a twentieth-century
approach using Diophantine approximation. (Pell, however, had nothing
to do with it. His name is attached to this equation because of Euler’s
mistaken belief that he did.)
The approach using Diophantine approximation gives a very quick proof
that Pell’s equation always has infinitely many solutions, but it has the
serious disadvantage that it does not provide any method, other than trial
and error, for finding them.
While the method of continued fractions gives the full theory, including
an effective method for finding all solutions, it requires developing the the-

111

i i

i i
i i

i i

112 4. Pell’s Equation

ory of continued fractions first, and that would be a considerable digression


for us.
Thus, we will present a variant of the cakravala method here, and fur-
thermore we will prove that it always works. This method is undoubtedly
very close to what Fermat had in mind, as you can tell from the similar-
ity between the last chapter and this one. It is also quite beautiful and
effective.

4.1 Representations and Their Composition


We let D be an arbitrary positive integer that is not a perfect square.
We shall prove all the results in this section by direct computation, but
there is a deeper reason why they are true. (When a computation produces
a particularly nice result, one should always ask why.) This reason can be
found in the exercises to this chapter.
If a and b are integers with m = a2 −b2 D, we will say that m = a2 −b2 D
is a representation of m, or that (a, b) represents m.
If (a, b) represents m, then so does (a, b) = (a, −b). We shall call (a, b)
the conjugate of (a, b).
If (a, b) represents m and c is any integer, then c(a, b) = (ca, cb) repre-
sents c2 m. On the other hand, if (a, b) represents m and c is any integer
dividing both a and b, then (a/c, b/c) represents m/c2 .
Finally, if (a, b) represents m and d = gcd(a, b), then (a/d, b/d) repre-
sents m/d2 . In this case we write (a, b)red = (a/d, b/d) and call (a, b)red
the reduction of (a, b). Of course, (a, b)red = (a, b) if and only if a and b
are relatively prime. In this case, we say (a, b) is reduced .
We call the reader’s attention to the similarity between Lemma 4.1 and
Lemma 3.2, and to the similarity between Definition 4.2 and Definition 3.3.
Lemma 4.1. Let (a, b) represent m and (c, d) represent n. Set
e = ac + bcD and f = ab + bc.
Then (e, f ) represents mn.
Proof: We simply compute
e2 − f 2 D = (ac + bcD)2 − (ad + bc)2 D
= (a2 c2 + 2abcdD + b2 d2 D2 ) − (a2 d2 + 2adbc + b2 c2 )D
= a2 c2 + b2 d2 D2 − a2 d2 D − b2 c2 D
= (a2 − b2 D)(c2 − d2 D) = mn. 

i i

i i
i i

i i

4.1. Representations and Their Composition 113

Definition 4.2. The representation mn = e2 − f 2 D of mn in Lemma 4.1


is obtained from the representations m = a2 − b2 D and n = c2 − d2 D by
composition, denoted
(e, f ) = (a, b) ∗ (c, d).

We now give several properties of composition, which we verify by direct


computation.
Lemma 4.3.
(1) Composition is commutative:
(a, b) ∗ (c, d) = (c, d) ∗ (a, b).

(2) Composition is associative:

((a, b) ∗ (c, d)) ∗ (e, f ) = (a, b) ∗ ((c, d) ∗ (e, f )).

(3) Composition commutes with conjugation:

(a, b) ∗ (c, d) = (a, b) ∗ (c, d).


Proof:
(1) By definition,

(a, b) ∗ (c, d) = (ac + bdD, ad + bc)


while
(c, d) ∗ (a, b) = (ca + dbD, cb + ad)
and these are equal.
(2) By definition,
((a, b) ∗ (c, d)) ∗ (e, f )
= (ac + bdD, ad + bc) ∗ (e, f )
= (ace + bdeD + (adf + bcf )D, acf + bdf D + ade + bce)

while

(a, b) ∗ ((c, d) ∗ (e, f ))


= (a, b) ∗ (ce + df D, cf + de)
= (ace + adf D + (bcf + bde)D, acf + ade + bce + bdf D)
and these are equal.

i i

i i
i i

i i

114 4. Pell’s Equation

(3) By definition,

(a, b) ∗ (c, d) = (ac + bdD, ad + bc) = (ac + bdD, −ad − bc)

while

(a, b) ∗ (c, d) = (a, −b) ∗ (c, −d)


= (ac + (−b)(−d)D, a(−d) + (−b)c)

and these are equal. 

We now combine composition and reduction into a single operation.

Definition 4.4. The reduced composition (a, b) ∗r (c, d) of (a, b) and (c, d) is

(a, b) ∗r (c, d) = ((a, b) ∗ (c, d))red .

(In other words, to obtain the reduced composition of (a, b) and (c, d), we
first compose (a, b) and (c, d) and then reduce the result.)
We now give several properties of reduced composition.

Lemma 4.5.

(1) For any (a, b),


(a, b) ∗r (a, b) = ±(1, 0).

(2) For any (a, b) and (c, d),

(a, b) ∗r (c, d) = (c, d) ∗r (a, b).

(3) If (a, b) and (c, d) are reduced and (a, b)∗r (c, d) = ±(1, 0), then (c, d) =
±(a, b).

(4) For any (a, b), (c, d), and t > 0

(a, b) ∗r t(c, d) = (a, b) ∗r (c, d).

Proof:

(1) By definition,

(a, b) ∗r (a, b) = ((a, b) ∗ (a, b))red = (a, b) ∗ (a, −b)red


= (a2 − b2 D, 0)red = ±(1, 0).

i i

i i
i i

i i

4.1. Representations and Their Composition 115

(2) (a, b) ∗r (c, d) = ((a, b) ∗ (c, d))red = (c, d) ∗ (a, b)red = (c, d) ∗r (a, b).

(3) If (a, b) ∗r (c, d) = ±(1, 0), then (a, b) ∗ (c, d) = (e, 0) for some e, so in
particular ad + bc = 0.
Then ad = −bc, so in particular a divides −bc. Now a and b are
relatively prime, as (a, b) is assumed to be reduced. So, by Euclid’s
Lemma, a divides c, i.e., c = ka for some a. Then ad = −bc = −b(ka),
so d = −bk. Thus (c, d) = (ka, −kb) = k(a, −b). But (c, d) is assumed
to be reduced, so k = ±1.

(4) Let (e, f ) = (a, b)∗(c, d). Then (te, tf ) = (a, b)∗t(c, d). Now (e, f )red =
(a, b) ∗r (c, d) = (e/s, f /s) where s = gcd(e, f ), while (te, tf )red =
(a, b) ∗r t(c, d) = (te/s , tf /s ) where s = gcd(te, tf ). But then we
have that s = gcd(te, tf ) = t gcd(e, f ) = ts, so

(a, b) ∗r t(c, d) = (te/s , tf /s ) = (te/(ts), tf /(ts))


= (e/s, f /s) = (a, b) ∗r (c, d). 

Corollary 4.6.

(1) For any (a, b), (c, d), and (e, f ),

((a, b) ∗ (c, d)) ∗r (e, f ) = ((a, b) ∗r (c, d)) ∗r (e, f ),

and
((a, b) ∗r (c, d)) ∗r (e, f ) = (a, b) ∗r ((c, d) ∗r (e, f )).

(2) If (a, b) is reduced and (a, b) ∗r (x, y) = (a, b), then (x, y)red = ±(1, 0).

Proof:

(1) The first claim follows immediately from Lemma 4.5(3). Let

(a, b) ∗ (c, d) = (g, h) and (a, b) ∗r (c, d) = (i, j).

Then by definition i = g/t and j = h/t, where t = gcd(g, h). Then


(g, h) = t(i, j). Then

((a, b) ∗ (c, d)) ∗r (e, f ) = (g, h) ∗r (e, f )


= t(i, j) ∗r (e, f )
= (i, j) ∗r (e, f )
= ((a, b) ∗r (c, d)) ∗r (e, f ).

i i

i i
i i

i i

116 4. Pell’s Equation

Now using this and the associativity of composition, we have

((a, b) ∗r (c, d)) ∗r (e, f ) = ((a, b) ∗ (c, d)) ∗r (e, f )


= (((a, b) ∗ (c, d)) ∗ (e, f ))red
= ((a, b) ∗ ((c, d) ∗ (e, f )))red
= (a, b) ∗r ((c, d) ∗ (e, f ))
= (a, b) ∗r ((c, d) ∗r (e, f )),

proving the second claim.

(2) Suppose (a, b) ∗r (x, y) = (a, b). Then

(a, b) ∗r ((a, b) ∗r (x, y)) = (a, b) ∗r (a, b),


(a, b) ∗r ((a, b) ∗r (x, y)) = ±(1, 0),
((a, b) ∗r (a, b)) ∗r (x, y) = ±(1, 0),
±(1, 0) ∗r (x, y) = ±(1, 0),
(±(1, 0) ∗ (x, y))red = ±(1, 0),
(x, y)red = ±(1, 0). 

Let us call attention to what Corollary 4.6 says. The first claim is that
if we wish to compute the reduced composition of (a, b), (c, d), and (e, f ),
we can either reduce at each stage or simply compute the composition at
each stage and reduce at the end—it makes no difference. The same holds
for any number of representations:

(. . . (((a1 , b1 ) ∗r (a2 , b2 )) ∗r (a3 , b3 )) ∗r . . . (an−1 , bn−1 )) ∗r (an , bn )


= (. . . (((a1 , b1 ) ∗ (a2 , b2 )) ∗ (a3 , b3 )) ∗ . . . (an−1 , bn−1 )) ∗r (an , bn )
= ((. . . (((a1 , b1 ) ∗ (a2 , b2 )) ∗ (a3 , b3 )) ∗ . . . (an−1 , bn−1 )) ∗ (an , bn ))red .

The second claim is that reduced composition is associative, and the third
claim is a cancellation result for reduced composition, up to sign.
In developing this theory of composition and reduction, we should not
lose sight of our goal—to solve Pell’s equation a2 − b2 D = 1. In our
language, this is finding a representation 1 = a2 − b2 D. Of course, 1
has the representations 1 = 12 − 02 D = (−1)2 − 02 D. We call these
two representations trivial representations and all other representations
of 1 nontrivial representations. So in fact, we are looking for nontrivial
representations of 1.

i i

i i
i i

i i

4.1. Representations and Their Composition 117

Let us observe that if (a, b) represents 1, or if (a, b) represents −1, then


(a, b) is automatically reduced. For if d is any common divisor of a and b,
then d divides a2 − b2 D = ±1, so d = ±1, i.e., a and b are relatively prime.
Our next lemma will give a key technique for finding nontrivial repre-
sentations of 1, i.e., nontrivial solutions of Pell’s equation.

Lemma 4.7. Let (a, b) and (c, d) both be reduced and suppose that (a, b) and
(c, d) represent the same integer m. Suppose also that a ≡ c (mod m) and
b ≡ d (mod m). Let

(e, f ) = (a, b) ∗r (c, d).

Then (e, f ) represents 1. Furthermore, if (c, d) = ±(a, b), then (e, f ) =


±(1, 0).

Proof: Let (E, F ) = (a, b) ∗ (c, d) = (a, −b) ∗ (c, d) = (ac − bdD, ad − bc).
Then (E, F ) represents m2 .
We claim that gcd(E, F ) = m.
Let us begin by seeing that each of E and F are divisible by m.
First F : Since a ≡ c (mod m) and d ≡ b (mod m), ad ≡ cb (mod m),
ad − bc ≡ 0 (mod m), i.e., m divides F = ad − bc.
Next E: Since (a, b) represents m, m = a2 − b2 D, so a2 − b2 D ≡
0 (mod m), a(a) − b(b)D ≡ 0 (mod m). But c ≡ a (mod m) and d ≡
b (mod m), so ac − bdD ≡ 0 (mod m), i.e., m divides E = ac − bdD.
Thus we see that (E/m, F/m) represents 1. But, as we observed, this
automatically shows that (E/m, F/m) is reduced, i.e., gcd(E/m, F/m) =
1, and so m = gcd(E, F ). But then

(e, f ) = (a, b) ∗r (c, d) = ((a, b) ∗ (c, d))red = (E/m, F/m)

and (e, f ) represents 1. Furthermore, the contrapositive of Lemma 4.5(3)


gives that if (c, d) = ±(a, b), then (a, b) ∗r (c, d) = ±(1, 0), i.e., in our case
(e, f ) = ±(1, 0). 

In the next section, we will show that we can always find (a, b) and
(c, d) satisfying the hypotheses of Lemma 4.7, and that will give us a single
nontrivial solution of Pell’s equation. But in fact, once we have a single
nontrivial solution, we have infinitely many, as we see from the next lemma.

Lemma 4.8. Let (e, f ) represent 1 nontrivially (i.e., (e, f ) = ±(1, 0)). Let

(e, f )2 = (e, f )∗ (e, f ), (e, f )3 = (e, f )2 ∗ (e, f ), (e, f )4 = (e, f )3 ∗ (e, f ), . . . .

i i

i i
i i

i i

118 4. Pell’s Equation

Then
(e, f ), (e, f )2 , (e, f )3 , (e, f )4 , . . .
2 3 4
(e, f ), (e, f ) , (e, f ) , (e, f ) , . . .
all represent 1 nontrivially and are all distinct.
Proof: If (e, f ) represents 1, then (e, f ) ∗ (e, f ) represents 1 · 1, i.e., (e, f )2
represents 1, and then (e, f )2 ∗(e, f ) represents 1·1, i.e., (e, f )3 represents 1,
n
etc., so (e, f )n represents 1 for every n ≥ 1. But then (e, f ) also represents
1 for every n ≥ 1.
We must show that they are all nontrivial and all distinct.
Replacing (e, f ) by (e, −f ), (−e, f ), or (−e, −f ), if necessary, we may
assume that e > 0 and f > 0. Write (ek , fk ) = (e, f )k . We claim that
f1 , f2 , f3 , . . . are a strictly increasing sequence of positive integers (and
hence that e1 , e2 , e3 , . . . are a strictly increasing sequence of positive in-
tegers), which shows that (e, f ), (e, f )2 , (e, f )3 , . . . are all nontrivial and
k
distinct. But also (e, f ) = (ek , −fk ), so all of these are nontrivial and
distinct.
We show this claim by direct computation, using induction. By as-
sumption, in case k = 1, (e1 , f1 ) = (e, f ) and e and f are positive integers.
Now suppose ek and fk are positive integers. Then

(ek+1 , fk+1 ) = (ek , fk ) ∗ (e, f ) = (ek e + fk f D, ek f + fk e).


Since ek ≥ 1, f ≥ 1, and e ≥ 1, we see fk+1 = ek f + fk e ≥ 1 + fk > fk
(and since e ≥ 1, f ≥ 1, and fk ≥ 1, we see ek+1 = ek e + fk f D ≥ ek + D >
ek ), as claimed. 

4.2 Solving Pell’s Equation


Rather than simply pull a rabbit out of a hat, we will proceed heuristically,
seeing how one might find a method for solving Pell’s equation. Then we
will prove that this method actually works. At the outset, our approach will
seem to have nothing to do with the last section, but we will soon see that
our method involves a sequence of (carefully chosen) reduced compositions.
We wish to solve
a2 − b 2 D = 1
and we see that this is equivalent to (assuming a and b are positive)
a2 = b2 D + 1 = b2 (D + 1/b2 ),

i i

i i
i i

i i

4.2. Solving Pell’s Equation 119

so

a = b D + 1/b2

a/b = D + 1/b2 .

Now for an arbitrary choice of b, D + 1/b2 will not have a rational square
root, so a/b will not be a rational number. We want to find a value of b
for which it does, but there is no a priori guarantee that such a value of b
exists. However,
√ we do observe that for a solution
√ (a, b), the ratio a/b will
be close to D, which we write as√a/b ∼ D, and furthermore, as b gets
larger the ratio a/b gets closer to D.
√ So let’s start with a guess for a and b that has a/b reasonably close to
D. Indeed, let us start with a pair (a, b) and set e = a2 − b2 D, so (a, b)
represents e. We would like to get e = 1, but let’s settle for the moment for
keeping |e| small. (What “small” means turns out to be a delicate question,
but we will save that for later.)
We would like to get a better guess (A, B) and we will try to do so
(with hindsight gained from working many examples) by setting

B = a + bx

with x yet to √ be determined, and by choosing


√ a suitable value of A. We
want A/B ∼ D, and we have a/b ∼ D, so we look for A/B ∼ a/b,
i.e., A ∼ (a/b)B = (a/b)(a + bx) = a2 /b + ax. We could try setting
A = a2 /b + ax, but there are two problems with this: first, a2 /b is not
an integer, and second, choosing A = (a/b)B would not be making the
situation any better, just keeping it the same (as then A/B = a/b).
We solve the first problem first. Note that from a2 − b2 D = 1 we obtain
a /b = bD + 1/b. So in our “try” for A above, we replace a2 /b with bD,
2

which is pretty close to it, to obtain

A = ax + bD.

Then we set E = A2 − B 2 D and hope that we have made things better.


Unfortunately,

E = A2 − B 2 D = (ax + bD)2 − (a + bx)2 D


= (a2 x2 + 2abxD + b2 D2 ) − (a2 D + 2abxD + b2 x2 D)
= a2 x2 + b2 D2 − a2 D − b2 x2 D
= (a2 − b2 D)(x2 − D)
= e(x2 − D).

i i

i i
i i

i i

120 4. Pell’s Equation



While we certainly should choose x near D, so |x2 − D| is small, this is
not an improvement, as |E| is a multiple of |e|.
However, we are not done yet! We have flexibility in the choice of x,
which we have not yet exploited.
Suppose A and B are divisible by |e|. Then A/|e| and B/|e| are integers,
and (A/|e|)2 − (B/|e|)2 D = E/e2 is an integer with |E/e2 | < |E| (unless
e = 1, in which case we are done, or e = −1, in which case, as we shall
see, we are almost done). Guided by this observation, we shall choose x
so that A and B are divisible by e, and set a = A/|e|, b = B/|e|, to find
that e = (a )2 − (b )2 D = E/e2 = |(x2 − D)/e|. Now e is not necessarily
equal to 1, but it cannot be too large—in particular, |e | ≤ |x2√− D|. So we
should try to choose x with |x2 − D| small, i.e., with x near D.
Let us summarize this discussion in a lemma.
Lemma 4.9. Let a and b be relatively prime, nonnegative√ integers and set
e√= a2 − b2 D. Then there is exactly one integer x with D − |e|/2 < x <
D + |e|/2 such that a + bx ≡ 0 (mod e). With this value of x, set

ax + bD
a = ,
e
a + bx
b = .
e
Then a and b are relatively prime, nonnegative integers.

Proof: There are two claims to prove:

(1) We can choose exactly one value of x as above.

(2) With this value of x, a and b are relatively prime, nonnegative inte-
gers.

We prove these in turn.


First, let us note that for any positive integer i, and any real number
r, the inequality r ≤ x ≤ r + i has exactly i + 1 integer solutions if r is an
integer and exactly i integer solutions otherwise. If r is an integer, both
x = r and x = r + i are integer solutions while if i is not an integer, neither
is an integer solution. Thus, we see that the inequality r < x < r + i
has exactly i − 1 integer solutions if r is an integer and exactly i solutions
otherwise. √ √
In our situation, D is not a rational number, so r = D √ − |e|/2 is
certainly not an integer, and, setting i = |e|, we have r + i = D + |e|/2,

i i

i i
i i

i i

4.2. Solving Pell’s Equation 121


√ √
so the inequality D − |e|/2 < x < D + |e|/2 has exactly i = |e| integer
solutions. They must be consecutive integers x0 , x0 + 1, . . . , x0 + |e| − 1 for
some x0 .
With this in mind, we get to work. We claim that e and b are relatively
prime. For suppose there was a prime p that divided both e and b. Then
p would divide e + b2 D = a2 , and so p would divide a, contradicting our
hypothesis that a and b are relatively prime.
Now, since e and b are relatively prime, the congruence

a + by ≡ 0 (mod e)
has exactly one solution y0 (mod e). But since x0 , x0 + 1, . . . , x0 + |e| − 1
are |e| consecutive integers, exactly one of them is congruent to y0 (mod e).
Choose that one and call it x.
Now for the second claim.
Let A = ax + bD and B = a + bx. By our choice of x, B ≡ 0 (mod e)
so b = |B/e| is a nonnegative integer. But also

aB − bA = a(a + bx) − b(ax + bD) = a2 − b2 D = e ≡ 0 (mod e).


Thus, bA ≡ aB (mod e). But B ≡ 0 (mod e), so bA ≡ 0 (mod e).
However, b and e are relatively prime, so A ≡ 0 (mod e). Hence a = |A/e|
is also a nonnegative integer.
It remains for us to show that a and b are relatively prime. Let d =
gcd(a , b ). We want to prove d = 1.
Now d divides a so d e divides A, and d divides b so d e divides B.
Since d e divides both A and B, d e divides

xB − A = x(a + bx) − (ax + bD) = b(x2 − D) = be,


so d divides b. Similarly, d e divides

xA − BD = x(ax + bD) − (a + bx)D = a(x2 − D) = ae,


so d divides a. But a and b are assumed to be relatively prime, so d = 1,
and then a and b are also relatively prime. 
Now suppose a and b are in Lemma 4.9 and set e = a2 − b2 D. Let a
and b be as in Lemma 4.9 and set e = (a )2 − (b )2 D. We said in the above
discussion that |e| should be “small.” To be precise, we want a bound on |e|
that ensures that if |e| is “small,” then |e | is also “small.” For if we cannot
ensure that, we lose control of the situation. The appropriate criterion is
a delicate one, one that is not apparent a priori, but one that makes the
proof work. We give it here.

i i

i i
i i

i i

122 4. Pell’s Equation

Lemma 4.10. In the situation of Lemma 4.9, let

e = a2 − b2 D and e = (a )2 − (b )2 D.


√ √
If |e| < 2 D, then |e | < 2 D.

Proof: Following the notation of the proof of Lemma 4.9, let

A = ax + bD and B = a + bx,

so a = |A/e| and b = |B/e|. Let E = A2 − B 2 D. We compute, as before,


that

E = (ax + bD)2 − (a + bx)2 D = e(x2 − D),

and so

e = (A/e)2 − (B/e)2 D
= (A2 − B 2 D)/e
= (1/|e|)(x2 − D).

We want to bound |e |, so we must compute x2 − D. We have


√ √
D − |e|/2 < x < D + |e|/2.
√ √
Since |e| < 2 D, we see that the lower bound D − |e|/2 is positive
(and so x is positive). Then

√ √
( D − |e|/2)2 < x2 < ( D + |e|/2)2 ,
√ √
D − |e| D + |e|2 /4 < x2 < D + |e| D + |e|2 /4,
√ √
−|e| D + |e|2 /4 < x2 − D < |e| D + |e|2 /4.

Now x2 − D may be positive or negative, so we must be careful about


signs. There are two cases to consider:

(1) x2 − D is positive, so 0 < x2 − D < |e| D + |e|2 /4.

(2) x2 − D is negative, so −|e| D + |e|2 /4 < x2 − D < 0.

In case (1),

x2 − D x2 − D √ √ √ √
|e | = = < D + |e|/4 < D + 2 D/4 < 2 D,
e |e|

i i

i i
i i

i i

4.2. Solving Pell’s Equation 123



where we have used our hypothesis that |e| < 2 D, and in case (2),
x2 − D −(x2 − D) √ √
|e | = = < D − |e|/4 < 2 D.
|e| |e|

Thus, in any case |e | < 2 D, as claimed. 

Let us assemble these two results and relate them to composition.


Corollary 4.11. Let a and b be nonnegative√ integers and suppose that (a, b)
is reduced
√ and represents e
√ with |e| < 2 D. Then there is a unique integer
x with D − |e|/2 < x < D + |e|/2 such that a + bx ≡ 0 (mod e). (Note
that x is positive.) For this value of x, let

(a , b ) = (a, b) ∗r (x, 1).



Then (a , b ) represents e with |e | < 2 D.

Proof: We follow the notation of the proof of Lemma 4.9 and Lemma 4.10.
We have by Lemma 4.9 that

(A, B) = (ax + bD, a + bx)

and we compute that

(a, b) ∗ (x, 1) = (ax + bD, a + bx),

so we immediately see that

(A, B) = (a, b) ∗ (x, 1).

Then, by definition,

(a, b) ∗r (x, 1) = ((a, b) ∗ (x, 1))red = (A, B)red = (A/d, B/d)

where d = gcd(A, B). But we have seen that |e| divides both A and B, and
by the properties of the gcd,

d = gcd(A, B) = |e| gcd(A/|e|, B/|e|) = |e| gcd(a , b ) = |e|,

as we have seen that a and b are relatively prime. Hence

(a, b) ∗r (x, 1) = (A/|e|, B/|e|) = (a , b )

as claimed. √   2  2
√ Finally, if |e| = |a − b D| < 2 D, then also |e | = |(a ) − (b ) D| <
2 2

2 D by Lemma 4.10. 

i i

i i
i i

i i

124 4. Pell’s Equation

We think of using Lemma 4.9 recursively. That is, we start with a pair
(a, b), apply Lemma 4.9 to (a, b) to get a pair (a , b ), apply Lemma 4.9
again to (a , b ) to get a pair (a , b ), . . . . This works for Lemma 4.10, or
for Corollary 4.11, in a similar way, but √ here we must start out with a pair
(a, b) with e = a2 − b2 D having |e| < 2 D. It may seem difficult to find
such a pair, but in fact it is trivial—we choose (a, b) = (1, 0)!
Let us establish some notation.
Definition 4.12. Let (a, b) and (a , b ) be as in Lemma 4.9. We write

(a , b ) = P(a, b).

If furthermore (a , b ) = P(a , b ) we write (a , b ) = P 2 (a, b), etc.

We let (a0 , b0 ) = (1, 0) and set

(ai , bi ) = P i (1, 0)

for every i > 0. (Here “P” stands for “Pell.”)


We also set

ei = a2i − b2i D.

Corollary 4.11 gives a value of x for which (ai+1 , bi+1 ) = (ai , bi )∗r (x, 1),
and we denote that value of x by xi+1 , so

(ai+1 , bi+1 ) = P(ai , bi ) = (ai , bi ) ∗r (xi+1 , 1).



Remark 4.13. √ Note, in particular, that x1 is the unique √ integer with D −
1/2 < x1 < D + 1/2, i.e., x1 is the integer closest to D. Then (a0 , b0 ) =
(1, 0) and (a1 , b1 ) = (x1 , 1). Also note that each xi is positive.

As a practical matter,
√ to employ Lemma
√ 4.9 we must search among the
|e| values of x with D − |e|/2 < x < D + |e|/2 for the one value of x
that makes a + bx ≡ 0 (mod e). But if we employ this method recursively,
we may use the following relation.
Lemma 4.14. In the above notation,

xi+1 ≡ −xi (mod ei ).

Proof: We are looking for a value of xi+1 that in particular satisfies the
congruence ai + bi xi+1 ≡ 0 (mod ei ), a congruence with a unique solution
mod ei . But we know that

ai = (ai−1 xi + bi−1 D)/|ei−1 | and bi = (ai−1 + bi−1 xi )/|ei−1 |

i i

i i
i i

i i

4.2. Solving Pell’s Equation 125

and we then compute that

ai + bi (−xi ) = −bi−1 (x2i − D)/|ei−1 | = ±bi−1 ei ≡ 0 (mod ei ),

as in the proof of Lemma 4.9 we computed that ei = (x2i − D)/ei−1 . Hence


xi+1 ≡ −xi (mod ei ) is a solution to this congruence. But since this con-
gruence has a unique solution, this must be the solution. 

It may seem that we have simply √ found a√sequence of√pairs (a1 , b1 ),


(a2 , b2 ), (a3 , b3 ), . . . with |e1 | < 2 D, |e2 | < 2 D, |e3 | < 2 D, . . ., while
what we are looking for is a nontrivial solution to Pell’s equation, i.e., a
pair (a, b) = ±(1, 0) with e = a2 − b2 D = 1, so this is not good enough.
But in fact it is! In fact, we have found a solution (actually, infinitely
many solutions) to Pell’s equation!

Theorem 4.15. Let D be any positive integer that is not a perfect square.
Then Pell’s equation a2 − b2 D = 1 has infinitely many solutions.

Proof: To begin with, we claim that the pairs of integers

(a0 , b0 ), (a1 , b1 ), (a2 , b2 ), ...

are all distinct. To see this, suppose we choose any two nonnegative integers
s < t. Recall that, by Corollary 4.11, and by Corollary 4.6(1),

(as+1 , bs+1 ) = (as , bs ) ∗r (xs+1 , 1)


(as+2 , bs+1 ) = (as+1 , bs+1 ) ∗r (xs+2 , 1)
= ((as , bs ) ∗r (xs+1 , 1)) ∗r (xs+2 , 1)
= (as , bs ) ∗r ((xs+1 , 1) ∗r (xs+2 , 1))

and continuing in this way we see that

(at , bt ) = (as , bs ) ∗r (X, Y )

where
(X, Y ) = (xs+1 , 1) ∗r (xs+2 , 1) ∗ · · · ∗r (xt , 1).
Note that each xi > 0, so Y = 0 (by the formula for composition) and
hence (X, Y ) = ±(1, 0). Then, by the contrapositive of Corollary 4.6(2),
(at , bt ) = (as , bs ).
Now we shall show that Pell’s equation has a single nontrivial solution.

i i

i i
i i

i i

126 4. Pell’s Equation

For each i, let ai ≡ ai (mod ei ) with 0 ≤ ai < |ei | and let bi ≡ bi (mod ei )
with 0 ≤ bi < |ei |. Consider the triples of integers

(e0 , a0 , b0 ), (e1 , a1 , b1 ), (e2 , a2 , b2 ), . . . .



Let k = [ D]. Then there are at most 2k possibilities for ei (as ei is a
nonzero integer between −k and k). Also, there are at most k possibilities
for ai (as 0 ≤ ai < |ei | ≤ k − 1) and at most k possibilities for bi (as 0 ≤
bi < |ei | ≤ k − 1). So there are at most 2k 3 possibilities for (ei , ai , bi ). But
this sequence has infinitely many terms. So, by the Pigeonhole Principle,
they cannot all be distinct, i.e., there must be a pair of nonnegative integers
s and t with s < t and (es , as , bs ) = (et , at , bt ). Then es = et ; call that
common value e.
Now on the one hand as ≡ at (mod e) and bs ≡ bt (mod e), but on the
other hand, as we saw in the first part of the proof, (as , bs ) = (at , bt ). Thus
we may apply Lemma 4.7 to conclude that, setting

(α, β) = (as , bs ) ∗r (at , bt ),

then (α, β) represents 1, i.e.,

α2 − β 2 D = 1

and (α, β) = ±(1, 0), so (α, β) is a nontrivial solution to Pell’s equation.


Now that we have a single nontrivial solution to Pell’s equation, we can
get infinitely many.
The easiest way to do so is to simply quote Lemma 4.8, which tells us
that the “powers” of (α, β), i.e.,

(α, β), (α, β)2 = (α, β) ∗ (α, β), (α, β)3 = (α, β)2 ∗ (α, β), ...

are all distinct solutions of Pell’s equation.


Alternatively, instead of appealing to Lemma 4.8, we may simply modify
our proof that there is one nontrivial solution. Namely, since there are only
finitely many possibilities for (ai , bi , ei ) and there are infinitely many terms
in this sequence, the Pigeonhole Principle tells us that there is some term
that is repeated infinitely often. That is, there is a sequence of nonnegative
integers s0 < s1 < s2 < · · · with

(es0 , as0 , bs0 ) = (es1 , as1 , bs1 ) = (es2 , as2 , bs2 ) = · · · .

Then, setting
(αi , βi ) = (as0 , bs0 ) ∗r (asi , bsi )

i i

i i
i i

i i

4.3. Numerical Examples and Further Results 127

for each i = 1, 2, 3, . . . , we see that, by the same argument as above,

(α1 , β1 ), (α2 , β2 ), (α3 , β3 ), ...

are all distinct nontrivial solutions of Pell’s equation. 

4.3 Numerical Examples and Further Results


In this section we give a number of examples of the method we described
in the last section, and describe some of its properties. We also give a
few “tricks” for speeding up the computation. First, we have a number
of tables giving the results of the method for various values of D. The
values of D were chosen to provide an illustrative sample of the phenomena
encountered.
For each value of D, the table gives ai , bi , ei = a2i − Db2i , ai ≡
ai (mod ei ), bi ≡ bi (mod ei ), and xi , where

(ai+1 , bi+1 ) = (ai , bi ) ∗r (xi+1 , 1), with (a0 , b0 ) = (1, 0).

We observe that, in each case, the sequence e0 , e1 , e2 , . . . is periodic. This


leads us to make the following definition.

Definition 4.16. For a fixed value and D, the period of the sequence {ei },
i.e., the smallest value of k for which ei+k = ei for every i ≥ 0, is called
the small period of D, denoted by k = p(D).

In particular, we see that 1 = e0 = ep(D) = e2p(D) · · · so that (aip(D) , bip(D) )


represents 1 for every i, giving a sequence of solutions to Pell’s equation.
But we observe that the sequence of triples (e0 , a0 , b0 ), (e1 , a1 , b1 ),
(e2 , a2 , b2 ), . . . is also periodic, and so we make another definition.

Definition 4.17. For a fixed value of D, the period of the sequence {(ei , ai , bi )},
i.e., the smallest value of k for which (ei+k , ai+k , bi+k ) = (ei , ai , bi ) for every
i ≥ 0, is called the large period of D, denoted by k = P (D).

Note in Tables 4.1–4.8 we have given at least one large period.


We also give in Table 4.9 the values of p(D) and P (D) for all square-free
positive integers D < 100.

i i

i i
i i

i i

128 4. Pell’s Equation

i ai bi ei ai bi xi
0 1 0 1 0 0
1 4 1 3 1 1 4
2 11 3 4 3 3 5
3 18 5 −1 0 0 3
4 137 38 −3 2 2 4
5 393 109 −4 1 1 5
6 649 180 1 0 0 3
7 4936 1369 3 1 1 4
8 14159 3927 4 3 3 5
9 23382 6485 −1 0 0 3
10 177833 49322 −3 2 2 4
11 510117 141481 −4 1 1 5
12 842401 233640 1 0 0 3

Table 4.1. D = 13.

Remark 4.18. Recall we defined

ai xi+1 + bi D ai bi xi+1
(ai+1 , bi+1 ) = , = (ai , bi ) ∗r (xi+1 , 1).
ei ei

Suppose we had not used absolute values and instead had defined

ai xi+1 + bi D ai bi xi+1
(ai+1 , bi+1 ) = , .
ei ei

Then we would have had (ai+1 , bi+1 ) = (ai , bi )∗r (xi+1 , 1) or (ai+1 , bi+1 ) =
(ai , bi ) ∗r −(xi+1 , 1). This would not have changed the small period p(D),
but might have changed the large period. Call the new value P  (D).
Then it could only have changed P  (D) by at most a factor of two, i.e.,
P  (D) = P (D), P  (D) = 2P (D), or P  (D) = (1/2)P (D), and all three
possibilities occur.

Remark 4.19. Since 1 = e0 = eP (D) = e2P (D) = · · · we see that p(D)


divides P (D) (and also p(D) divides P  (D)).

Now we give some results that enable us to speed up our search.

i i

i i
i i

i i

4.3. Numerical Examples and Further Results 129

i ai bi ei ai bi xi
0 1 1 1 0 0
1 4 1 −3 1 1 4
2 13 3 −2 1 1 5
3 61 14 −3 1 2 5
4 170 39 1 0 0 4
5 1421 326 −3 2 2 4
6 4433 1017 −2 1 1 5
7 20744 4759 −3 2 1 5
8 57799 13260 1 0 0 4
9 483136 110839 −3 1 1 4
10 1507207 345777 −2 1 1 5
11 7052899 1618046 −3 1 2 5
12 19651490 4508361 1 0 0 4
13 164264819 37684934 −3 2 2 4
14 512445947 117563163 −2 1 1 5
15 2397964916 550130881 −3 2 1 5
16 6681448801 1532829480 1 0 0 4

Table 4.2. D = 19.

i ai bi ei ai bi xi
0 1 0 1 0 0
1 5 1 4 1 1 5
2 9 2 −3 0 2 3
3 32 7 −5 2 2 6
4 55 12 1 0 0 4
5 527 115 4 3 3 5
6 999 218 −3 0 2 3
7 3524 769 −5 4 4 6
8 6049 1320 1 0 0 4
9 57965 12649 4 1 1 5
10 109881 23978 −3 0 2 3
11 387608 84583 −5 3 3 6
12 665335 145188 1 0 0 4
13 6375623 1391275 4 3 3 5
14 12085911 2637362 −3 0 2 3
15 42633356 9303361 −5 1 1 6
16 73180801 15969360 1 0 0 4

Table 4.3. D = 21.

i i

i i
i i

i i

130 4. Pell’s Equation

i ai bi ei ai bi xi
0 1 0 1 0 0
1 5 1 −4 1 1 5
2 16 3 −5 1 3 7
3 27 5 4 3 1 3
4 70 13 −1 0 0 5
5 727 135 4 3 3 5
6 2251 418 5 1 3 7
7 3775 701 −4 3 1 3
8 9801 1820 1 0 0 5
9 101785 18901 −4 1 1 5
10 315156 58523 −5 1 3 7
11 528527 98145 4 3 1 3
12 1372210 254813 −1 0 0 5
13 14250627 2646275 4 3 3 5
14 44124091 8193638 5 1 3 7
15 73997555 13741001 −4 3 1 3
16 192119201 35675640 1 0 0 5
17 1995189565 370497401 −4 1 1 5
18 6177687896 1147167843 −5 1 3 7
19 10360186227 1923838285 4 3 1 3
20 26898060350 4994844413 −1 0 0 5
21 279340789727 51872282415 4 3 3 5
22 864920429531 160611691658 5 1 3 7
23 1450500069335 269351100901 −4 3 1 3
24 3765920568201 699313893460 1 0 0 5
25 39109705751345 7262490035501 −4 1 1 5
26 121095037822236 22486783999963 −5 1 3 7
27 203080369893127 37711077964425 4 3 1 3
28 527255777608490 97908939928813 −1 0 0 5
29 5475638145978027 1016800477252555 4 3 3 5
30 16954170215542571 3148310371686478 5 1 3 7
31 28432702285107115 5279820266120401 −4 3 1 3
32 73819574785756801 13707950903927280 1 0 0 5

Table 4.4. D = 29.

i i

i i
i i

i i

4.3. Numerical Examples and Further Results 131

i ai bi ei ai bi xi
0 1 1 1 0 0
1 6 1 5 1 1 6
2 11 2 −3 2 2 4
3 39 7 2 1 1 5
4 206 37 −3 2 1 5
5 863 155 −6 5 5 7
6 1520 273 1 0 0 5
7 17583 3158 5 3 3 6
8 33646 6043 −3 1 1 4
9 118521 21287 2 1 1 5
10 626251 112478 −3 1 2 5
11 2623525 471199 −6 1 1 7
12 4620799 829920 1 0 0 5
13 53452314 9600319 5 4 4 6
14 102283829 18370718 −3 2 2 4
15 360303801 64712473 2 1 1 5
16 1903802834 341933083 −3 2 1 5
17 7975515137 1432444805 −6 5 5 7
18 14047227440 2522956527 1 0 0 5
19 162495016977 29184966602 5 2 2 6
20 310942806514 55846976677 −3 1 1 4
21 1095323436519 196725896633 2 1 1 5
22 5787559989109 1039476459842 −3 1 2 5
23 24245563392955 4354631736001 −6 1 1 7
24 42703566796801 7669787012160 1 0 0 5

Table 4.5. D = 31.

i i

i i
i i

i i

132 4. Pell’s Equation

i ai bi ei ai bi xi
0 1 0 1 0 0
1 8 1 6 2 1 8
2 23 3 7 2 3 10
3 61 8 9 7 8 11
4 99 13 −1 0 0 7
5 1546 203 −6 4 5 8
6 4539 596 −7 3 1 10
7 12071 1585 −9 2 1 11
8 19603 2574 1 0 0 7
9 306116 40195 6 2 1 8
10 898745 118011 7 1 5 10
11 2390119 313838 9 7 8 11
12 3881493 509665 −1 0 0 7
13 60612514 7958813 −6 4 5 8
14 177956049 23366774 −7 5 4 10
15 473255633 62141509 −9 2 1 11
16 768555217 100916244 1 0 0 7
17 12001583888 1575885169 6 2 1 8
18 35236196447 4626739263 7 4 6 10
19 93707005453 12304332620 9 7 8 11
20 152177814459 19981925977 −1 0 0 7
21 2376374222338 312033222275 −6 4 5 8
22 6976944852555 916117740848 −7 6 2 10
23 18554460335327 2436320000269 −9 1 1 11
24 30131975818099 3956522259690 1 0 0 7

Table 4.6. D = 58.

i i

i i
i i

i i

4.3. Numerical Examples and Further Results 133

i ai bi ei ai bi xi
0 1 0 1 0 0
1 8 1 3 2 1 8
2 39 5 −4 3 1 7
3 164 21 −5 4 1 9
4 453 58 5 3 3 6
5 1523 195 4 3 3 9
6 5639 722 −3 2 2 7
7 29718 3805 −1 0 0 8
8 469849 60158 −3 1 2 8
9 2319527 296985 4 3 1 7
10 9747957 1248098 5 2 3 9
11 26924344 3447309 −5 4 4 6
12 90520989 11590025 −4 1 1 9
13 335159612 42912791 3 2 2 7
14 1766319049 226153980 1 0 0 8
15 27925945172 3575550889 3 2 1 8
16 137863406811 17651600465 −4 3 1 7
17 579379572416 74181952749 −5 1 4 9
18 1600275310437 204894257782 5 2 2 6
19 5380205503727 688864726095 4 3 3 9
20 19920546704471 2550564646598 −3 2 2 7
21 104982939026082 13441687959085 −1 0 0 8
22 1659806477712841 212516442698762 −3 1 2 8
23 8194049449538123 1049140525534725 4 3 1 7
24 34436004275865333 4409078544837662 5 3 2 9
25 95113963378057876 12178095108978261 −5 1 1 6
26 319777894410038961 40943363871772445 −4 1 1 9
27 1183997614262097968 151595360378111519 3 2 2 7
28 6239765965720528801 798920165762330040 1 0 0 8

Table 4.7. D = 61.

i i

i i
i i

i i

134 4. Pell’s Equation

i ai bi ei ai bi xi
0 1 0 1 0 0
1 10 1 9 1 1 10
2 19 2 −3 1 2 8
3 124 13 −3 1 1 10
4 849 89 −10 9 9 11
5 1574 165 1 0 0 9
6 30755 3224 9 2 2 10
7 59936 6283 −3 2 1 8
8 390371 40922 −3 2 2 10
9 2672661 280171 −10 1 1 11
10 4954951 519420 1 0 0 9
11 96816730 10149151 9 4 4 10
12 188678509 19778882 −3 1 2 8
13 1228887784 128822443 −3 1 1 10
14 8413535979 881978219 −10 9 9 11
15 15598184174 1635133995 1 0 0 9
16 304779035285 31949524124 9 8 8 10
17 593959886396 62263914253 −3 2 1 8
18 3868538353661 405533009642 −3 2 2 10
19 26485808589231 2776467153241 −10 1 1 11
20 49103078824801 5147401296840 1 0 0 9
21 959444306260450 100577091793201 9 7 7 10
22 1869785533696099 196006782289562 −3 1 2 8
23 12178157508437044 1276617785530573 −3 1 1 10
24 83377317025363209 8740317716424449 −10 9 9 11
25 154576476542289374 16204017647318325 1 0 0 9
26 3020330371328861310 316616653015472624 9 5 5 10
27 5886084266115433250 617029288383626923 −3 2 1 8
28 38336835968021460851 4018792383317234162 −3 2 2 10
29 262471767510034792701 27514517394837012211 −10 1 1 11
30 486606699052048124551 51010242406356790260 1 0 0 9

Table 4.8. D = 91.

i i

i i
i i

i i

4.3. Numerical Examples and Further Results 135

D p(D) P (D) D p(D) P (D) D p(D) P (D)


2 2 2 37 2 2 69 6 24
3 1 1 38 2 2 70 4 8
5 2 2 39 2 2 71 6 24
6 2 2 40 2 2 72 2 2
7 2 2 41 6 12 73 10 30
8 1 1 42 2 2 74 6 36
10 2 2 43 6 36 75 2 4
11 2 2 44 4 8 76 8 48
12 2 2 45 4 4 77 4 8
13 6 6 46 8 96 78 2 4
14 2 2 47 2 2 79 2 2
15 1 1 48 1 1 80 1 1
17 2 2 50 2 2 82 2 2
18 2 2 51 2 2 83 2 2
19 4 8 52 4 4 84 2 2
20 2 2 53 8 8 85 8 8
21 4 16 54 4 16 86 6 72
22 4 8 55 4 8 87 2 2
23 2 2 56 2 2 88 4 4
24 1 1 57 4 24 89 10 10
26 2 2 58 8 24 90 2 2
27 2 2 59 4 16 91 5 30
28 4 8 60 2 4 92 6 180
29 8 8 61 14 28 93 6 12
30 2 2 62 2 2 94 10 120
31 6 24 63 1 1 95 2 4
32 2 4 65 2 2 96 2 4
33 2 4 66 2 2 97 12 180
34 2 2 67 8 48 98 2 2
35 1 1 68 2 2 99 1 1

Table 4.9. Small and large periods for values of D between 2 and 99.

i i

i i
i i

i i

136 4. Pell’s Equation

Lemma 4.20.

(1) Suppose (a, b) represents −1. Then

(a, b) ∗ (a, b) = (a, b) ∗r (a, b)

represents 1.

(2) Suppose that (a, b) represents ±2. Then

(a, b) ∗r (a, b) = 12 ((a, b) ∗ (a, b))

represents 1.

Proof:

(1) This is immediate.

(2) If (a, b) represents ±2 then it must be reduced. Then (a, b) ∗ (a, b)


represents 4. But

(a, b) ∗ (a, b) = (a2 + b2 D, 2ab)


= (a2 − b2 D + 2b2 D, 2ab)
= (±2 + 2b2 D, 2ab)
= 2(±1 + b2 D, ab),

so (a, b) ∗r (a, b) = (±1 + b2 D, ab) represents 1. 

Lemma 4.21. Let (a, b) and (c, d) both be reduced and suppose that, for
some integer m, (a, b) represents m and (c, d) represents −m. Suppose also
that a ≡ c (mod m) and b ≡ d (mod m). Then (a, b) ∗r (c, d)
represents −1.

Proof: This is almost identical to the proof of Lemma 4.7. (E, F ) = (a, b) ∗
(c, d) represents −m2 and then gcd(E, F ) = |m|, so (e, f ) = |m| 1
((a, b) ∗
(c, d)) = (a, b) ∗r (c, d) represents −1. 

We also make the trivial but useful observation that if (a, b) represents
m, then (a, −b), (−a, b), and (−a, −b) also represent m.
Now let us look at some examples (and we refer the reader to Tables 4.1–
4.8).

i i

i i
i i

i i


4.4. Units in O( D) 137

Example 4.22.

(1) D = 13: (a0 , b0 ) = (1, 0) with e0 = 1 and we compute (a1 , b1 ) = (4, 1)


with e1 = 3, (a2 , b2 ) = (11, 3) with e2 = 4, and (a3 , b3 ) = (18, 5) with
e3 = −1. Then (a3 , b3 ) ∗r (a3 , b3 ) = (649, 180) = (a6 , b6 ) represents 1.

(2) D = 19: (a0 , b0 ) = (1, 0) with e0 = 1 and we compute (a1 , b1 ) = (4, 1)


with e1 = −3, and (a2 , b2 ) = (13, 3) with e2 = −2. Then (a2 , b2 ) ∗r
(a2 , b2 ) = (170, 39) = (a4 , b4 ) represents 1.

(3) D = 29: (a0 , b0 ) = 1 with e0 = 1 and we compute (a1 , b1 ) = (5, 1)


with e1 = −4, (a2 , b2 ) = (16, 3) with e2 = −5, and (a3 , b3 ) = (27, 5)
with e3 = −4. We cannot apply Lemma 4.21 directly because, while
e3 = −e1 and b3 = b1 , a3 = a1 . But if we replace (a1 , b1 ) by (−5, 1)
then −5 ≡ 3 (mod 4) and we can use Lemma 4.21. Then (−5, 1) ∗r
(27, 5) = (−70, −13), so (70, 13) = (a4 , b4 ) represents −1, and then
(70, 13) ∗ (70, 13) = (9801, 1820) = (a8 , b8 ) represents 1.

(4) D = 61: (a0 , b0 ) = (1, 0) with e0 = 1 and we compute (a, b) = (8, 1)


with e1 = 3, (a2 , b2 ) = (39, 5) with e2 = −4, (a3 , b3 ) = (164, 21) with
e3 = −5, and (a4 , b4 ) = (453, 58) with e4 = 5. Although e4 = −e3 we
cannot apply Lemma 4.21 (even after a sign change). But (a5 , b5 ) =
(1523, 195) with e5 = 4, so using (a2 , b2 ) and (a5 , b5 ) and a sign change,
we find that (39, −5)∗r (1523, 195) = (29718, 3805) = (a7 , b7 ) represents
−1 and then (29718, 3805)∗(29718, 3805) = (1766319049, 226153980) =
(a14 , b14 ) represents 1.

(5) D = 91: (a0 , b0 ) = (1, 0) with e0 = 1 and we compute (a1 , b1 ) = (10, 1)


with e1 = 9, (a2 , b2 ) = (19, 2) with e2 = −3, and (a3 , b3 ) = (124, 13)
with e3 = −3. Then, using (a2 , b2 ) and (a3 , b3 ) and a sign change, we
find that (19, −2) ∗r (124, 13) = (1574, 165) = (a5 , b5 ) represents 1.


4.4 Units in O( D)
In this section we assume that D is a square-free
√ positive integer, D = 1.
Our objective is to find all units of R = O( D).

Definition
√ 4.23. Let D be a square-free positive
√ integer, D = 1, and let
R = O( D). Among √ all units ε = c + d D of R, let ε0 be defined as
follows: ε0 = c0 + d0 D where d0 is the minimum positive value of d and
c0 is positive. In almost all cases this determines c0 uniquely, but if not,

i i

i i
i i

i i

138 4. Pell’s Equation

we choose c0 to be the smallest positive value of c for our given choice of


d0 . The unit ε0 is called the fundamental unit of R.

Let us observe that this definition makes sense. First note that, since
Pell’s√equation x2 − y 2 D = 1 has a nontrivial solution, there is a unit
c + d D of R with c and d both positive (and √ in fact, there
√ are infinitely
many such units). Next note that ε = c + d D = (a + b D)/2 where a
and b are both even integers, if D ≡ 2 or 3 (mod 4), or where a and b are
either both even integers or both odd integers, if D ≡ 1 (mod 4), and in
each case the possible values of b (and if necessary, the possible values of
a) are well-ordered, as we require that b and a be positive.

Lemma  1, and let R =


√ 4.24. Let D be a square-free positive integer, D =
O( D). The fundamental unit ε0 of R is the smallest unit of R that is
greater than 1.

Proof: First let us observe that it is not a priori clear that there is a small-
est unit greater than 1, as there could be an infinite sequence of units
ε, ε , ε , . . . with ε > ε > ε > . . . > 1. But in fact there is such a smallest
unit.
Note that the statement of the lemma is equivalent to the following
statement:

Let ε be any unit of R with ε > 1. Then ε ≥ ε0 .

We leave the proof of this to the exercises. 

√4.25. Let D be a square-free positive integer, D = 1, and let


Theorem
R = O( D). Then the units of R are

{. . . , ±ε−3 −2 −1
0 , ±ε0 , ±ε0 , ±1, ±ε0 , ±ε0 , ±ε0 , . . .}.
2 3

√ √
Proof: Let ε √= c + d D be any unit √ of R. Note that ε = c − d D,
−ε = −c − d D, and −ε = −c + d D are all units of R. Also note that
ε = ±ε−1 (+ if εε = 1 and − if εε = √ −1). Thus in order to prove the
theorem, it suffices to prove if ε = c + d D is any unit of R with c and d
positive, then ε =√εk0 for some positive integer k.
Since ε = c+d D with c and d positive, we see that ε > 1. Also, ε0 > 1,
so the sequence 1, ε0 , ε20 , ε30 , . . . of nonnegative powers of ε0 is a (strictly)
increasing sequence that diverges to +∞. Hence there is some positive
integer k with εk−1
0 < ε ≤ εk0 . But then 1 < ε ≤ ε0 where ε = ε/(εk−1 0 ).

i i

i i
i i

i i

4.5. Exercises 139


√ √
D =2 ε0 = 1 + √2 εPell =3+2 √ 2 εPell = ε20
D =3 ε0 = 2 + √3 εPell =2+ √ 3 εPell = ε0
D =5 ε0 = (1 + √5)/2 εPell = 9 + 4√5 εPell = ε60
D =6 ε0 = 5 + 2√6 εPell = 5 + 2√6 εPell = ε0
D =7 ε0 = 8+3 √ 7 εPell =8+3 √ 7 εPell = ε0
D = 10 ε0 = 3 + 10√ εPell = 19 + 6√10 εPell = ε20
D = 11 ε0 = 10 + √
3 11 εPell = 10 + 3 11√ εPell = ε0
D = 13 ε0 = (3 + √ 13)/2 εPell = 649 + √
180 13 εPell = ε60
D = 14 ε0 = 15 +√4 14 εPell = 15 +√4 14 εPell = ε0
D = 15 ε0 = 4 + √15 εPell = 4 + 15√ εPell = ε0
D = 17 ε0 = 4 + 17 √ εPell = 33 + 8 17√ εPell = ε20
D = 19 ε0 = 170 +√39 19 εPell = 170 + 39√ 19 εPell = ε0
D = 21 ε0 = (5 + 21)/2
√ εPell = 55 + 12 √21 εPell = ε30
D = 22 ε0 = 197 + 42
√ 22 εPell = 197 + √
42 22 εPell = ε0
D = 23 ε0 = 24 + 5√23 εPell = 24 + 5√23 εPell = ε0
D = 33 ε0 = 23 + 4√33 εPell = 23 + 4√33 εPell = ε0
D = 34 ε0 = 35 +√6 34 εPell = 35 + 6 34√ εPell = ε0
D = 37 ε0 = 6 + 37√ εPell = 735 + √
12 37 εPell = ε20
D = 141 ε0 = 95 + 8 141 εPell = 95 + 8 141 εPell = ε0

Table 4.10. Relation between ε0 and εPell for selected values of D.

But ε is a unit as (ε )−1 = ε−1 εk−1


0 . In other words, ε is a unit that is
greater than 1 and less than or equal to ε0 , so by Lemma 4.10 we must
have ε = ε0 , which gives ε = εk0 . 


Remark 4.26. For a given value of D, let εPell be the unit εPell = a + b D
obtained from the smallest solution to Pell’s equation a2 − b2 D = 1 in
√ integers a and b. Then sometimes εPell is the fundamental unit ε0
positive
of O( D) and sometimes not. Table 4.10 is a table of ε0 , εPell , and the
relation between them for selected values of D.

4.5 Exercises
In Exercises 4.1 and 4.2, D is a positive integer that is not a perfect square.
In the remaining exercises, D is a square-free positive integer, D = 1.

Exercise 4.1.

(a) Use our method to find a solution of Pell’s equation a2 − b2 D = 1 for


various small values of D.

i i

i i
i i

i i

140 4. Pell’s Equation

(b) In particular, use our method to find, by hand computation, the small-
est solution a = 176631909 and b = 226153980 of Pell’s equation
a2 − 61b2 = 1 in positive integers a and b. (Compare Table 4.7 and
Example 4.22.) If Fermat could do this computation by hand, so can
you.

Exercise 4.2. Write a computer program to implement our method of solv-


ing Pell’s equation for relatively small values of a, b, and D, and verify
parts of Tables 4.1–4.9.

Exercise 4.3. The Archimedes cattle problem was a famous problem posed
by Archimedes. Search the Internet for this problem. (You will find lots
of references.) The heart of this problem is solving Pell’s equation a2 −
4729494b2 = 1. Write a computer program to implement our method of
solving Pell’s equation that can handle relatively large values of a, b, and
D and apply it to the case D = 4729494. Show that at step 60 it produces
the following solution:

a = 109931986732829734979866232821433543901088049,
b = 50549485234315033074477819735540408986340.

The next few problems explain why many of the results in√Section 4.1
are true, by relating these results to arithmetic in the field Q( D). Given
a pair√of rational
√ numbers (a, b), let (a, b) correspond to√the element α =
a + b D of Q( D). We write this as (a, b) ↔ α = a + b D.

Exercise 4.4.

(a) Show that if (a, b) ↔ α then (a, b) ↔ α,


√ so conjugation of representa-
tions corresponds to conjugation in Q( D).

(b) Show that if (a, b) ↔ α and (c, d) ↔ β, then (a, b) ∗ (c, d) = αβ.
Thus, composition of representations
√ (as defined in Definition 4.2) cor-
responds to multiplication in Q( D).

Exercise 4.5. Show that the fact that composition of representations is


commutative, associative, and commutes
√ with conjugation follows from
the fact that multiplication in Q( D) is commutative, associative, and
commutes with conjugation, thereby proving Lemma 4.3.

Exercise 4.6.

(a) If (a, b) ↔ α, show that (a, b) represents N(α), the norm of α.

i i

i i
i i

i i

4.5. Exercises 141

(b) Use√the fact that N(αβ) = N(α) N(β) for any two elements α and β of
Q( D) to prove Lemma 4.1.

Exercise 4.7. Let (a, b) ↔ α. Observe that, if α = ±1, then the powers of
α are all distinct. Use this to prove Lemma 4.8.

Exercise 4.8. We have often had occasion to compute (a, b) ∗ (c, d). If
(a, b) ↔ α and (c, d) ↔ β, show that (a, b) ∗ (c, d) ↔ N(α)β/α, so compo-
sition with (a, b) corresponds to multiplication by N(α)/α.

Exercise 4.9. If (a, b) ↔ α, (x, 1) ↔ χ, and (a , b ) ↔ χ , show that α =


±χ/α, so reduced composition of (x, 1) with (a, b) corresponds (up to sign)
to division of χ by α. (Compare Corollary 4.11.)

The next few problems give the proof of Lemma 4.24.


√ √
Exercise 4.10. Let ε = a + b D be a unit of R = O( D). Observe √ that
|ε| = 1/|ε|. Use this observation to prove that, for a unit ε = a + b D of
R, ε > 1 if and only if a and b are both positive.
√ √
Exercise
√ 4.11. Let ε = a + b D be a unit of R = O( D). Observe that
a = ± e + b√ 2 D, where e = εε = ±1. Use this observation to prove that if
√ √
ε1 = a1 + b1 D and ε2 = a2 + b2 D are units of O( D), with a1 , b1 , a2 ,
and b2 all positive, and with b1 < b2 , or with b1 = b2 and a1 < a2 , then
ε1 < ε2 .

Exercise 4.12. Show that


√ the only case in√which it is possible to have dis-
tinct units ε1 = a1 + b D and√ε2 = a1 + b D with √a1 , b, and a2 all positive
is for D = 5 when ε1 = (1 + 5)/2 and ε2 = (3 + 5)/2, or vice versa.

Exercise 4.13. Use the results of Exercises 4.10 and 4.11 to prove Lemma 4.24.

Exercise 4.14. Let R = O( D). Suppose that √ D ≡ 1 (mod 8). Show that
every unit ε of R is of the form ε = a + b D with a and b integers. (Of
course, this is automatically true for D ≡ 2, 3, 6, or 7 (mod 8) as in those
cases, every element of R is of that form. In case D ≡ 5 (mod 8), all units
of R may or may not be of this form, as we see from Table 4.10.)

Exercise 4.15. Let R = O( D). √Suppose that D ≡ 3, 6, or 7 (mod 8).
Show that every unit ε of R = O( D) has norm N(ε) = 1.

i i

i i
i i

i i

142 4. Pell’s Equation

Exercise 4.16.

(a) Verify the entries in Table 4.10.

(b) Extend this table to include all values of D between 26 and 47.

In the next two exercises we adopt the notation of Remark 4.26.



Exercise 4.17. Let R = O( D).

(a) Suppose that D ≡ 3, 6, or 7 (mod 8). Show that εPell = ε0 .

(b) Suppose that D ≡ 1 or 2 (mod 8). Show that εPell = εk0 for k = 1 or 2.

(c) Suppose that D ≡ 5 (mod 8). Show that εPell = εk0 for k = 1, 2, 3, or
6. (All of these possibilities occur, as illustrated in Table 4.10.)

Exercise 4.18. Let R = O( D).

(a) Suppose that a2 − b2 D = −4 has a solution in odd integers a and b.


Show that εPell = ε60 , and conversely.

(b) Suppose that a2 − b2 D = −4 does not have a solution in odd integers


a and b, but that a2 − b2 D = 4 has a solution in odd integers a and b.
Show that εPell = ε30 , and conversely.

(c) Suppose that a2 − b2 D = −4 does not have a solution in odd integers a


and b, and that a2 − b2 D = 4 does not have a solution in odd integers
a and b, but that a2 − b2 D = −1 has a solution in integers a and b.
Show that εPell = ε20 , and conversely.

(d) Suppose that a2 − b2 D = −4 does not have a solution in odd integers


a and b, that a2 − b2 D = 4 does not have a solution in odd integers a
and b, and that a2 − b2 D = −1 does not have a solution in integers a
and b. Show that εPell = ε0 , and conversely.

i i

i i
i i

i i

Chapter 5

Towards Algebraic Number


Theory

In Chapters 1 through 4 we considered
√ the quadratic fields K = Q( D) and
the rings of integers R = O( D), and we kept our discussion as elementary
as possible while still proving interesting results. But this consideration
was just the tip of an iceberg. Our aim in this chapter is to indicate how
these results generalize. The field of mathematics that these generalize to
is known as algebraic number theory. As a matter of historical fact, our
development here parallels the development of algebraic number theory.
Historically, quadratic fields were considered first, and their investigation
motivated the investigation of more general algebraic number fields.
Our goal here is to provide a guide to some of the high points of algebraic
number theory, especially those most related to the topics we have studied.
In order to do so, we will have to introduce some more advanced concepts
than we have done so far, so this chapter will require more background of
the reader. Also, as this is a guide and not a complete treatment, we will
not prove the more advanced results we state. But we hope this chapter will
motivate the reader to go on and study algebraic number theory further.
However, we will revisit quadratic fields and prove a number of results,
and do a number of examples, in them. These will provide concrete exam-
ples of this general theory, as well as being very interesting in themselves.
There is one notational point that we should mention up front. We
will be intensively studying ideals in this chapter. Standard mathematical
notation, which we shall adopt below, gives parentheses special meaning
in the context of ideals. Parentheses are of course also the standard way
to group items for multiplication. In order to avoid confusion, we shall
instead, throughout this chapter, except in the last section, use square
brackets to group items for multiplication.

143

i i

i i
i i

i i

144 5. Towards Algebraic Number Theory

5.1 Algebraic Numbers and Algebraic Integers


We begin by defining algebraic numbers.

Definition 5.1. A complex number α is algebraic if α is a root of a poly-


nomial f (X) with rational coefficients. Otherwise α is transcendental .

It follows from properties of polynomials that if α is algebraic, there


is a unique monic polynomial of lowest degree having α as a root. (A
polynomial f (X) is monic if the coefficient of the highest power of X is
1.) We call this polynomial the minimum polynomial of α and denote it
by mα (X).

Example 5.2.

(1) If r is a rational number then mr (X) = X − r, a polynomial of degree


1. More interestingly, m√2 (X) = X 2 − 2 and m√3 (X) = X 2 − 3. Also,
m√2√3 (X) = X 2 − 6 and m√2+√3 (X) = X 4 − 10X 2 + 1. Finally,
m√ 3 (X) = X
2
3
− 2.

(2) It is relatively easy to show that there are transcendental numbers, but
not so easy to show that any particular number is transcendental. It
is a famous theorem of Hermite that e is transcendental and a famous
theorem of Lindemann that π is transcendental,

and a case of the
famous Gelfond-Schneider theorem that 2 2 is transcendental.

Let K be any subfield of the field of complex numbers C. Then K is a


vector space over Q and its dimension is called the degree of K, denoted
degK/Q .

Definition 5.3. A subfield K of C is an algebraic number field if its degree


degK/Q is finite.

At first glance it may not be obvious what this has to do with K being
“algebraic,” but here is the connection.

Lemma 5.4. Let K be an algebraic number field. Then every element α of


K is an algebraic number.

Proof: Let n = degK/Q and consider {1, α, . . . , αn }. This is a set of n + 1


elements in an n-dimensional vector space, so it must be linearly dependent.
Thus, there are rational numbers a0 , a1 , . . . , an with an αn +. . .+a1 α+a0 =
0, and then α is a root of the polynomial f (X) = an X n + . . . + a1 X + a0 .

i i

i i
i i

i i

5.1. Algebraic Numbers and Algebraic Integers 145

Now that we know what an algebraic number field is, we need to know
what an algebraic integer is.

Definition 5.5. Let K be an algebraic number field. An element α of K is


an algebraic integer if mα (X) is a polynomial with integer coefficients.

Lemma 5.6. Let K be an algebraic number field. If α and β are algebraic


integers in K, then α + β and αβ are algebraic integers in K.

Note that this lemma is far from obvious. It is a nontrivial fact that
if mα (X) has integer coefficients and mβ (X) has integer coefficients, then
mα+β (X) and mαβ (X) have integer coefficients, but it turns out to be true.
But having this lemma we may make the following definition.

Definition 5.7. The ring of integers O(K) of K is

O(K) = {α in K | α is an algebraic integer}.


√ √ √ √
Example
√ √ 5.8. Referring
√ back to Example 5.2, we see that 2, 3, 2 3,
3
2 + 3, and 2 are all algebraic integers.

Remark 5.9. For any algebraic number field K, O(K) ∩ Q = Z. For if r is


in Q, then mr (X) = X − r, and this polynomial has integer coefficients if
and only if r is in Z.

Remark 5.10. It is important to distinguish between the integers in Q and


the (algebraic) integers in K. We shall always use the term “integer” to
refer to an element of Z and the term “algebraic integer” to refer to an
element of O(K). Also, we will always use the term “prime” to refer to a
prime number in Z. We should point out that it is often the case in the
literature that, in order to emphasize the distinction, the elements of Z are
called the “rational integers” and the primes in Z are called the “rational
primes.”

Lemma 5.11. Let K be an algebraic number field and let α be any element
of K. Then there is an integer m such that mα is an algebraic integer.

Proof: Let mα (X) = X d +ad−1 X d−1 +. . .+a1 X +a0 and let each ai be the
rational number ai = ri /si . Let m = lcm(s0 , . . . , sd−1 ) and let β = mα.
Then α = β/m, so 0 = mα (α) = mα (β/m) = [β/m]d +ad−1[β/m]d−1 +. . .+
a1 [β/m] + a0 . Multiplying through by md we see that β d + ad−1 mβ d−1 +
. . .+a1 md−1 β+a0 md = 0, so mβ (X) = X d +ad−1 mX d−1 +. . .+a1 md−1 X +
a0 md and all of the coefficients of mβ (X) are integers. 

i i

i i
i i

i i

146 5. Towards Algebraic Number Theory

Using this lemma we may find the additive structure of O(K).

Proposition 5.12. Let K be an algebraic number field. Then O(K) is a free


Z-module of rank degK/Q .

Proof: Let degK/Q = n and let {α1 , . . . , αn } be a basis for K as a vector


space over Q. Then, replacing αi by βi = mi αi for a suitable integer mi ,
we have a new basis of K given by {β1 , . . . , βn } with each βi in O(K).
Thus O(K) contains a free Z-module of rank n.
On the other hand, if {γ1 , . . . , γn+1 } is any set of n + 1 elements in
O(K), there is a linear relation c1 γ1 + . . . + cn+1 γn+1 = 0 with each ci in
Q. Multiplying through by the lcm of the denominators of the ci ’s gives a
linear relation d1 γ1 + . . . + dn+1 γn+1 = 0 with each di in Z, so O(K) does
not contain a free Z-module of rank n + 1.
Hence O(K) must be a free Z-module of rank n. 

Example 5.13. An algebraic number field √K of degree 2 is called a quadratic


field . Any quadratic
√ field K must be
√ Q( D) for some square-free integer
D = 1. We let O( D) denote O(Q( D)).
If D ≡ 2 or 3 (mod 4), then
√ √
O( D) = {a + b D | a and b are integers }.

In this case, we see that O( D) is a free Z-module of rank 2 with basis

{1, D}.
If D ≡ 1 (mod 4), then

√ √
O( D) = {[a + b D]/2 | a and b are integers, and either they are
both even or they are both odd}.

√ case, we see that O( D) is a free Z-module of rank 2 with basis
In this
{1, [1 + D]/2}.

Remark 5.14. We defined an algebraic integer α in Definition 5.5 as an


algebraic number for which the polynomial mα (X) is a monic polynomial
with integer coefficients. Of course, mα (X) is a particular polynomial with
mα (α) = 0. There is an alternate definition of an algebraic integer: α is an
algebraic integer if there is some monic polynomial with integer coefficients
f (X) with f (α) = 0. It is a theorem (a consequence of Gauss’s Lemma for
polynomials) that these two definitions are equivalent.

i i

i i
i i

i i

5.2. Ideal Theory 147

5.2 Ideal Theory


Let us return to the consideration of a general integral domain R. We have
defined the notion of an ideal of R in Definition 2.28, and the notion of a
principal ideal of R in Definition 2.29.
First we introduce a bit of (standard) notation. We have denoted the
principal ideal generated by α as Iα . This notation has the disadvantage
that the interesting information is relegated to the subscript. So, instead,
we will denote it by (α). In Definition 2.34, we denoted the ideal generated
by the set A = {α1 , ..., αk } by IA . This notation has the same disadvantage,
so we will adopt the same solution, denoting it instead by (α1 , . . . , αk ).
We certainly know how to multiply elements of R and we certainly
know what it means for one element of R to divide another. Guided by
this knowledge, we shall seek the analogs for ideals.
Consider the principal ideal (α) consisting of multiples of the element
α and the principal ideal (β) consisting of multiples of the element β. Let
γ = αβ be the product of these two elements, and consider the principal
ideal (γ). We would like to have a definition of multiplication of ideals that
makes (γ) = (α)(β). Let us see what that might be.
Let α be an arbitrary element of the principal ideal (α), so α = αλ
for some element λ of R. Similarly, let β  be an arbitrary element of the
principal ideal (β), so β  = βμ for some element μ of R. Then

α β  = (αλ)(βμ) = (αβ)(λμ) = γν where ν = λμ.

Thus, we conclude that α β  is an element of (γ). But we can also see that
every element of (γ) is of this form. For if γ  is an arbitrary element of (γ),
then γ  = γν for some element ν of R, and then

γ  = γν = (αβ)ν = α(βν) (or = (αν)β )

is the product of an element of (α) and an element of (β). Guided by this,


we are tempted to define the product of two ideals as follows: Let I and
J be ideals of R. Then their product K = IJ is the ideal consisting of all
elements of R of the form α β  where α is any element of I and β  is any
element of J. But examining the definition of an ideal (Definition 2.28) we
see that this does not quite work (although, as you can verify, it does work
for principal ideals).
Definition 2.28 tells us that if K is an ideal, it must satisfy two proper-
ties. The second property is that if γ  is any element of K and ν is any ele-
ment of R, then γ  ν must also be an element of K. This is no problem: since

i i

i i
i i

i i

148 5. Towards Algebraic Number Theory

γ  is in IJ, it is of the form γ  = α β  and so γ  ν = (α β  )ν = α β  where


β  = β  ν. Then α is certainly an element of I and β  is an element of J
(since J is an ideal), so γ  ν is of the required form. However, the first prop-
erty does not hold, in general. Suppose γ1 and γ2 are two elements of the
required form, γ1 = α1 β1 and γ2 = α2 β2 . Then γ3 = γ1 +γ2 = α1 β1 +α2 β2 ,
and there is no reason to expect that this sum can be written in the form
α3 β3 .
So instead we modify our definition in the simplest possible way in order
to force the first property to work as well.
Definition 5.15. Let I and J be two ideals of R. Their product is the ideal
IJ given by

IJ = { αi βi | each αi is in I1 and each βi is in I2 , and the sum is finite}.

Of course, we must check that this does define an ideal, and this is
indeed the case. The proof of this is similar to the proof of Lemma 2.35.
We leave the proof of the following useful lemma to the reader.
Lemma 5.16. Let R be an integral domain and let I = (α1 , . . . , αi ) and
J = (β1 , . . . , βj ) be ideals in R.
(1) I ⊆ J if and only if αk is an element of J for each k = 1, . . . , i.
Consequently, I = J if and only if αk is an element of J for each
k = 1, . . . , i and βk is an element of I for each k = 1, . . . , j.

(2) Let K = IJ be the product of the ideals I and J. Then

K = (α1 β1 , . . . , α1 βj , α2 β1 , . . . , α2 βj , . . . , αi β1 , . . . , αi βj ).

Remark 5.17. Note that R = (1) and that IR = I for any ideal I of R.
Also note that multiplication of ideals is certainly commutative, and that
Lemma 5.16(2) implies that multiplication of ideals is also associative.

Now let us think about the analog of divisibility. Suppose that the
element α of R divides the element β of R. Then β is a multiple of α, and
then every multiple of β is a multiple of α. In other words, the principal
ideal (α) contains the principal ideal (β). So, here we do not need to make
another definition. We simply regard the analog of divisibility as being that
the ideal I contains the ideal J, I ⊇ J. Now recall from Definition 2.45
that an element α of R is a prime if α dividing βγ implies that α divides
β or α divides γ. With this analogy and definition in mind, we can define
a prime ideal in general.

i i

i i
i i

i i

5.2. Ideal Theory 149

Definition 5.18. A proper ideal P of R is a prime ideal if, whenever J and


K are ideals of P with P ⊇ JK, then P ⊇ J or P ⊇ K.

We just noted that the analog of α dividing β for two ideals I and J was
that I ⊇ J. But we could have asked for something stronger. Namely, if α
divides β, then β = αγ for some γ. Thus, passing to ideals, we might ask
that the analog be that there is an ideal K with J = IK. This is indeed
stronger in general, as the next proposition shows.

Proposition 5.19. Let I and J be ideals in an integral domain R. If there


is an ideal K with J = IK, then I ⊇ J.

Proof: By Definition 5.15, J = IK consists of linear combinations of ele-


ments of the form αβ, with α in I and β in K. But by the definition of
an ideal, every element of this form, and also every linear combination of
elements of this form, is in I. 

We need to introduce one further property of ideals.

Definition 5.20. Let I be a proper ideal in an integral domain R. If J ⊇ I


for some ideal J implies that J = I or J = R, then I is a maximal ideal.

Being maximal is a stronger condition than being prime, as the next


proposition shows.

Proposition 5.21. Let I be an ideal in an integral domain R. If I is a


maximal ideal, then I is a prime ideal.

Proof: Let I be a maximal ideal, and suppose that I ⊇ JK. We need to


show that I ⊇ J or I ⊇ K. We prove this by contradiction.
Assume this is not the case. Then there is an element α of J that is
not in I, and an element β of K that is not in I. But, since I ⊇ JK,
the element γ = αβ is in I. Let I  be the ideal generated by α and I,
I  = {αζ + δ | ζ in R, δ in I}. Certainly I  ⊇ I, and indeed I  ⊃ I, as α is
in I  but not in I. Hence, by the definition of a maximal ideal, I  = R. In
particular, 1 is in I  , so 1 = αζ0 + δ0 for some element ζ0 of R and some
element δ0 of I. But then

β = β(αζ0 + δ0 ) = (αβ)ζ0 + δ0 β = γζ0 + δ0 β

is an element of I by the definition of an ideal, which is a contradiction.

i i

i i
i i

i i

150 5. Towards Algebraic Number Theory

5.3 Dedekind Domains


We now single out a key class of integral domains, the Dedekind domains.
The definition of a Dedekind domain involves three conditions. We will be-
gin by stating the definition and then we will explain what these conditions
mean.
Definition 5.22. An integral domain R is a Dedekind domain if
(1) every prime ideal of R is maximal;

(2) R satisfies the ascending chain condition;

(3) R is integrally closed in its quotient field.

We know what the first condition means. In Proposition 5.21 we proved


that for an integral domain R, every maximal ideal R is prime. Here we
want the converse to be true as well, so that the prime ideals are exactly
the maximal ideals.
Lemma 5.23. Every prime ideal of Z is maximal.

Proof: We know that every ideal I of Z consists of the multiples of some


integer i, I = (i).
Suppose that I = (i) is not maximal. Then there is a proper ideal J of
Z with J ⊃ I. Then we must have J = (j) for some integer j = ±1 and
j = ±i. Since I ⊆ J, i = jk for some integer k, with k = ±1 and k = ±i.
Let K = (k). Then I = JK ⊇ JK but I ⊇ J and I ⊇ K, so I is not a
prime ideal. 

Definition 5.24. An integral domain R satisfies the ascending chain condi-


tion (ACC), or is noetherian, if every sequence of ideals I1 ⊂ I2 ⊂ I3 . . . of
R is finite.

Lemma 5.25. Z satisfies the ascending chain condition.

Proof: Let I1 ⊂ I2 ⊂ I3 . . . be a sequence of ideals in Z. Then I1 = (i1 ),


I2 = (i2 ), I3 = (i3 ), . . . , with |i1 | > |i2 | > |i3 | > . . . . Thus {|i1 |, |i2 |, |i3 |, . . .}
is a strictly decreasing sequence of nonnegative integers, which must be
finite. 

The last condition in the definition of a Dedekind domain is considerably


more subtle. To begin with, we must define the quotient field of an integral
domain R. This definition mimics the construction of the rational numbers
Q from the integers Z.

i i

i i
i i

i i

5.3. Dedekind Domains 151

Definition 5.26. Let R be an integral domain. Its quotient field F is given


by
F = {a/b | a and b are in R, and b = 0},
with a/b = c/d in F if ad = bc in R. We define addition and multiplication
in F by

a/b + c/d = [ad + bc]/[bd], [a/b][c/d] = [ac]/[bd],

and we regard R as a subset of F by identifying a with a/1.

(It is easy to check that F is a field, and that the “usual” laws of
fractions work in F. The reader may worry that the definition of F is
circular in that we are already assuming that we can divide elements of R
in defining F, but this is not the case. In the definition of F “/” is just a
symbol, but once F is defined, we can give it its usual interpretation, so
that b[a/b] = a.)

Example 5.27.

(1) If R = Z, its quotient field is Q.

(2) Let K be an algebraic number field and let R = O(K) be the ring of
integers of K. Then K is the quotient field of R. To see this, let α be
any element of K. By Lemma 5.11, there is an integer n with β = nα
in R, and then α = β/n, and n is in R by Remark 5.9.

To state the last condition properly, we need to slightly generalize the


notion of an algebraic integer. To this end, let Z be any subring of Q.
(Note that Z ⊇ Z as Z must contain 1.) Then an element α of an algebraic
number field K is Z-integral if its minimum polynomial mα (X) has all of
its coefficients in Z.

Definition 5.28. Let R be a subring of an algebraic number field K and let


Z = R∩Q. The integral closure of R in K is S = {α in K | α is Z-integral}.
R is integrally closed in K if R = S.

Lemma 5.29. For any algebraic number field K, R = O(K) is integrally


closed in K.

Proof: By Remark 5.9, Z = R∩Q = Z, so an element α of K is Z-integral if


and only if it is Z-integral. But this is exactly the definition of an algebraic
integer. 

i i

i i
i i

i i

152 5. Towards Algebraic Number Theory

This definition seems like virtually a tautology, and to understand it we


should see how it can possibly go wrong.
Example 5.30.

(1) Let K = Q( D). Fix an integer m > 1 and let

R = {a + b D | a and b are integers and b is divisible by m}.

Note that for any element β of R = O(K), β  = 2mβ is in R . (The


factor of 2 is to take care of the case when D ≡ 1 (mod 4).) As we have
seen in Example 5.27, for any element α of K, there is an element β
of R and an integer n with α = β/n. But then α = β  /[2mn], so K is
also the quotient field of R . But of course in this case S = R ⊃ R .
(This example
√ may seem
√ artificial but in fact is not. Let D = m2 D.
Then Q( D ) = Q( D)√and we might naturally have been led to R


if we had considered Q( D ).)



(2) Here is a very important example. Let K = Q( D) where D ≡
1 (mod 4) and let

R = {a + b D | a and b are integers}.

Then for any element β of R = O(K), β  = 2mβ is in R , so, by the


same argument as above, we see that K is the quotient field of R but
S = R ⊃ R . Thus, while in this case one might have naively thought of
R as the right ring to consider, it is not.
Definition 5.31. Let R be an integral domain with quotient field K. A
subset I of K is a fractional ideal of K if there is an element γ0 of R such
that J = γ0 I is an ideal of R, where J = γ0 I = {γ0 β | β in I}.
Observe that every ideal I of R is a fractional ideal of K, as in this case
we may choose γ0 = 1. Also observe that the definition of a fractional ideal
is a very natural one, as in this case I = [1/γ0 ]J. From this we can easily
see that if I1 and I2 are fractional ideals of K, then I1 I2 is also a fractional
ideal of K. Finally, we can easily generalize the notion of a principal ideal.
For α0 any element of K, we let (α0 ) = {α0 β | β in R} and call (α0 ) a
principal fractional ideal.
Definition 5.32. Let R be an integral domain with quotient field K. Let I
be a nonzero fractional ideal of K. Then I −1 is defined by

I −1 = {α in K | αI ⊆ R}.

i i

i i
i i

i i

5.3. Dedekind Domains 153

By definition, II −1 ⊆ R. The following is a key technical lemma, which


we shall not prove.

Lemma 5.33. Let R be an integral domain with quotient field K. Let I be


a nonzero fractional ideal of K. If R is a Dedekind domain, then I −1 is a
fractional ideal of K with II −1 = R.

We mentioned that, for ideals I, J, and K of an integral domain R,


the condition that I = JK is stronger than the conditions that I ⊆ J, but
these conditions are not in general equivalent. But they are equivalent for
Dedekind domains.

Corollary 5.34. Let R be a Dedekind domain and let I and J be ideals of


R with I ⊆ J. Then there is an ideal K of R with I = JK.

Proof: Let K = J −1 I. Then K is a fractional ideal of K, and, since I ⊆ J,


K = J −1 I ⊆ J −1 J = R, so K is in fact an ideal of R. But then, since
multiplication of fractional ideals is associative (as you can easily check),
JK = JJ −1 I = I. 

Here is the main general result about Dedekind domains, and the reason
why we introduced them.

Theorem 5.35. Let R be an integral domain. The following are equivalent.

(1) R is a Dedekind domain.

(2) Every nonzero ideal I of R can be factored essentially uniquely into a


product of prime ideals, i.e., I = P1e1 P2e2 · · · Pkek with each Pi a prime
ideal, with Pi = Pj for i = j, and with e1 , e2 , . . . ek positive integers,
and if also I = Qf11 Qf22 · · · Qf  with each Qi a prime ideal, with Qi = Qj
for i = j and with f1 , f2 , . . . f positive integers, then = k, and, after
possible reordering, Qi = Pi and fi = ei for each i.

(In this case, we can also factor every nonzero fractional ideal essentially
e e 1 +1 ek1 +2 e 1 +k2
uniquely as I = P1e1 P2e2 · · · Pk1k1 Pk1k+1 Pk1 +2 · · · Pk1k+k2
with the Pi ’s mu-
tually distinct prime ideals and with e1 , . . . , ek1 positive integers and ek1 +1 ,
. . . , ek1 +k2 negative integers.)

Corollary 5.36. Let R be a Dedekind domain. Then R is a PID if and only


if R is a UFD.

i i

i i
i i

i i

154 5. Towards Algebraic Number Theory

Proof: In this proof we will be very careful to distinguish between prime


elements and prime ideals. We begin by recalling that an element of a UFD
is a prime element if and only if it is an irreducible element.
We know that every PID is a UFD. Thus, what we must prove here is
that every Dedekind domain that is a UFD is also a PID. Thus, let R be
a Dedekind domain that is a UFD. We want to show that every ideal of
R is principal. Since, by Theorem 5.35, every nonzero ideal is a product
of prime ideals, it clearly suffices to show that every prime ideal of R is
principal.
Let P be a prime ideal of R. Let α be any nonzero element of P . Then
we can factor α into a product of prime elements α = π1e1 . . . πkek . Then
P ⊇ (α) = (π1 )e1 . . . (πk )ek so by the definition of a prime ideal, P ⊇ (πi )
for some i. To simplify the notation, let π = πi . We will show that (π) is
a maximal ideal. Then, since P ⊇ (π), in fact P = (π), a principal ideal.
Thus, let Q be any ideal of R with Q ⊇ (π). We need to show Q = R.
Choose any element ρ of Q that is not in (π). Then Q ⊇ (π, ρ) ⊃ (π).
By Corollary 5.34 there is an ideal Q of R with QQ = (π). Let σ be
any element of Q . Then ρσ is an element of (π), so the element ρσ is
divisible by the element π. But π is a prime element that does not divide
the element ρ, so π must divide the element σ, and thus σ is an element of
the ideal (π). Thus we have

QQ = (π) and Q ⊆ (π).

We certainly have that QQ ⊆ Q , so we see that we must have Q = (π),


and then Q(π) = (π). This readily implies that 1 is in Q, and hence
Q = R. 

5.4 Algebraic Number Fields and


Dedekind Domains
We are about to achieve one of our major goals. We began by investigating
unique factorization of elements. We saw that this property holds in Z but
saw that it may or may not hold in rings of integers in particular quadratic
fields. The appropriate generalization of unique factorization is not unique
factorization of elements, but rather unique factorization of ideals, and we
are about to see that this property holds in the ring of integers of every
algebraic number field.

i i

i i
i i

i i

5.4. Algebraic Number Fields and Dedekind Domains 155

Theorem 5.37. Let K be an algebraic number field and let R = O(K) be


the ring of integers in K. Then R is a Dedekind domain.

Corollary 5.38 (Dedekind). Let K be an algebraic number field and let R =


O(K) be the ring of integers in K. Then every nonzero ideal I of R can
be factored essentially uniquely into a product of prime ideals.

Proof: This is immediate from Theorem 5.35 and Theorem 5.37. 

In order to prove Theorem 5.37, we shall have to introduce the notion


of the norm of an ideal of R, a notion that is important in its own right.
As a matter of notation, for a set S, we let #(S) denote the cardinality of
S, i.e., the number of elements of S.

Lemma 5.39. Let K be an algebraic number field and let R = O(K). Let
n = degK/Q .

(1) If I = (m), the principal ideal generated by the nonzero integer m, then
#(R/I) = |m|n .

(2) If I is any ideal of R, then #(R/I) is finite.

Proof:

(1) By Proposition 5.12, we know that R is a free Z-module of rank n. Let


{α1 , . . . , αn } be a basis for R. Then I has basis {mα1 , . . . , mαn } and
we see that R/I is isomorphic (as a Z-module) to [Z/mZ]n , a set of
cardinality |m|n .

(2) We claim that I contains a nonzero integer m. To see this, choose any
nonzero element α and consider its minimum polynomial mα (X). We
know that mα (X) = X d + ad−1 X d−1 + . . . + a1 X + a0 with all the
coefficients integers, and mα (α) = 0. Note that a0 = 0 as otherwise
mα (X) would have X as a factor, and mα (X)/X would be a polynomial
of lower degree having α as a root. But then we can solve for a0 :

a0 = −αd − ad−1 αd−1 − . . . − a1 α,

and m = a0 is in I.
Let J = (m). Then I ⊇ J, so R/I is a quotient of R/J. But by (1),
R/J is a finite set, so R/I must be a finite set as well. 

i i

i i
i i

i i

156 5. Towards Algebraic Number Theory

Definition 5.40. Let K be an algebraic number field, let R = O(K), and


let I be any ideal of R. The norm I of the ideal I is I = #(R/I).

Proof of Theorem 5.37: We must verify that R satisfies the three conditions
for a Dedekind domain in Definition 5.22.

(1) Here we use some general theory. In general, an ideal I of a commuta-


tive ring R is maximal if and only if R/I is a field, and is prime if and
only if R/I is an integral domain. But every finite integral domain is a
field. Putting these implications together, we have the following: Let
I be a prime ideal of R. Then R/I is a finite integral domain, hence a
field, and hence I is a maximal ideal of R.

(2) Observe that for any two ideals I and J of R with I a proper subset of
J, R/J is a proper quotient of R/I, and hence J divides I. Now
let I1 ⊂ I2 ⊂ I3 . . . be a sequence of ideals in R. Then I1  > I2  >
I3  > . . . is a strictly decreasing sequence of positive integers and so
must be finite.

(3) We proved this in Lemma 5.29. 

Lemma 5.41. Let K be an algebraic number field. Let

I(K) = {nonzero fractional ideals of K}

and let

IPrin (K) = {nonzero principal fractional ideals of K}.

Then I(K) is an abelian group under multiplication of fractional ideals,


and IPrin (K) is a subgroup of I(K).

Proof: Clearly, multiplication of fractional ideals is commutative and asso-


ciative, and O(K) = (1) is an identity. Inverses are given by Lemma 5.33:
the inverse in I(K) of the fractional ideal I is the fractional ideal I −1 of
Definition 5.32. Clearly, the subset IPrin (K) is closed under multiplication
and taking inverses, so is a subgroup. 

Definition 5.42. Let K be an algebraic number field. The ideal class group
of K is the quotient group

C(K) = I(K)/IPrin (K).

i i

i i
i i

i i

5.4. Algebraic Number Fields and Dedekind Domains 157

Here is one of the fundamental theorems of algebraic number theory.

Theorem 5.43 (Minkowski). For any algebraic number field K, C(K) is fi-


nite.

Given this theorem, we may make the following definition.

Definition 5.44. Let K be an algebraic number field. The class number of


K is

h(K) = #(C(K)),

the order of the ideal class group of K.

Corollary 5.45. Let K be an algebraic number field. The following are


equivalent:

(1) h(K) = 1.

(2) O(K) is a PID.

(3) O(K) is a UFD.

Proof: (1) and (2) are equivalent by definition, and (2) and (3) are equiv-
alent by Corollary 5.36. 

We have the projection map π : I(K) → C(K) and we let [I] = π(I),
and call [I] the ideal class of the ideal I. Thus the ideal class [I] of I is
trivial if and only if I is a principal ideal.
Theorem 5.43 and Corollary 5.45 tell us that, for any algebraic number
field K, even if unique factorization of elements does not hold in O(K), in
some sense it only misses by a finite amount.

Remark 5.46. There is an effective procedure for finding C(K) for any al-
gebraic number field K.

We conclude this section with a result that we record for future


reference.

Lemma 5.47. Let K be an algebraic number field and let I and J be any
two ideals of R = O(K). Then IJ = I · J.

Proof: From abelian group theory, we know that R/I is isomorphic to the
quotient of R/IJ by I/IJ, so #(R/IJ) = #(R/I) · #(I/IJ). Since R is a
Dedekind domain, I/IJ is isomorphic to R/J. 

i i

i i
i i

i i

158 5. Towards Algebraic Number Theory



5.5 Prime Ideals in O( D)
In √
this section we determine, and describe, all the prime ideals in R =
O( D). This is a long procedure, and so we proceed in stages.
There is one slight
√ complication. In most cases, an element α of R is of
the form α = a + b D, and it is clear what we mean by saying p divides
a or p divides b. However, if D ≡ 1 (mod 4), then a and b may be half–
integers, a = a /2 and b = b /2 with a and b odd integers. In this case,
for p an odd prime, we will say that p divides a or b if p divides a or p
divides b , and all the arguments go through unchanged. (This is because
the denominator 2 is relatively prime to p.) But the case D ≡ 1 (mod 4)
and p = 2 will require a special argument in some places.

Lemma 5.48. Let α0 be any element of R of norm p, where p is a prime.


If β is any element of R with norm n divisible by p, then β is divisible by
α0 or by α0 in R.

Proof: Let α0 = a0 + b0 D. Then a20 − b20 D = ±p gives a20 ≡ b20 D (mod p).
Note that b0 ≡ 0 (mod p). Let k0 be a solution of k02 ≡ D (mod p), and
note that k0 is unique up to sign. Then a0 ≡ ±k0 b0 (mod p).
Suppose that d ≡ 0 (mod p). Then, by the same logic, c ≡ ±k0 d (mod p).
Replacing k0 by −k0 if necessary, and replacing α0 by α0 , if necessary,
which switches the sign of b0 , we may assume that a0 ≡ k0 b0 (mod √ p) and
c ≡ k0 d (mod p). Direct calculation then shows that β/α0 = ±[e + f D]/p
where e = ac− bdD and f = bc− ad. Now e ≡ k0 bk0 d− bdD ≡ bd[k02 − D] ≡
0 (mod p) and f ≡ k0 dc − k0 cd ≡ 0 (mod p) so β/α0 is in R.
If d ≡ 0 (mod p) then c ≡ 0 (mod p) so β = pβ  with β  in R, and then
certainly β is divisible by α0 in R. 

Lemma 5.49. Let I be any ideal in R. Then I = (α) for some α in R, or


I = (g, α) for some integer g and some element α of R with α divisible
by g.

Proof: If I = (0) we are done, and if I is a principal ideal then I = (α) for
some α in R and we are done.
Suppose I = (n, α) for some integer n. Let g = gcd(n, α). Then for
some integers a and b, g = na+αb = na±ααb is in I, and then I = (g, α)
and we are done.
Thus, to complete the proof we must show that every nonzero ideal
I in R is of the form I = (α) or I = (n, α). To see this, choose any
nonzero element β0 of I. Then I contains the nonzero integer ββ0 , and

i i

i i
i i

i i


5.5. Prime Ideals in O( D) 159

then I contains β0 β0 D as well.√Let S1 = {|n | = 0 | n is an integer in I}
and let S2 = {|b | = 0 | a + b D is in I}. Then S1 is a nonempty set
of positive integers and so has a smallest element n. Similarly, S2 √ has
a smallest element b (which√may be a half–integer). Let α = a + b D
be in I. Now if β = c + d D is any element of I, it follows from the
division algorithm that d is a multiple of b, d = jb for some integer j. Then
m = β − jα is an integer, and it again follows from the division algorithm
that m is a multiple of n, m = n for some integer . Thus we see that
β = n + αj and so I = (n, α). 

Remark 5.50.

(1) It is sometimes useful to add a “redundant” generator to I. Namely, if


I = (α), then α = ±αα is in I, and so we may write I = (α, α).

(2) We see from (1) that if I = (α) is a principal ideal, then I = (g, α) is
a principal ideal for g = ±α. On the other hand, if g = ±α, then
I = (g, α) = (±α, α) = (±αα, α) = (α) is a principal ideal.

(3) If I = (g, α) as above, then any element β of I is of the form gγ + αδ,


and then β = |ββ| = gγγ + g[γαδ + γαδ] + ααγγ. So we see that
β is divisible by g for every β in I.

Proposition 5.51. Every prime ideal P of R is of the following form:

(1) P = (α0 ) where α0  = p for some prime p; or

(2) P = (p) for some prime p where R does not have an element that is
not divisible by p but whose norm is divisible by p; or

(3) P = (p, α1 ) where p is a prime, and α1 is an element of R not divisible


by p but with with α1  divisible by p.

Proof: We prove the theorem by ruling out every ideal that is not of one
of the above forms.
Let I be an ideal of R. By Remark 5.50, we may assume I is of the
form (g, α) with g an integer dividing α.
First, suppose g has more than one prime factor. Write g = g1 g2 with
g1 and g2 relatively prime, g1 = ±1, g2 = ±1. Let I1 = (g1 , α) and
I2 = (g2 , α). Then I1 I2 = (g, g1 α, g2 α, α2 ) ⊆ (g, α) = I but I1 ⊆ I, as
I1 has the element g1 with g1  = g12 not divisible by g, and I2 has the
element g2 with g2  = g22 not divisible by g. Thus I is not a prime ideal.

i i

i i
i i

i i

160 5. Towards Algebraic Number Theory

Thus, we must have g = ±pj for some j ≥ 1. Suppose that j ≥ 3. Let


I1 = (p, α). Then I1j ⊆ I, but I1 ⊆ I as I1 has the element p with p = p2
not divisible by g.
Suppose that j = 2. If p2 divides α in R, then I = (p2 ) = (p)(p)
and I ⊇ (p) so I is not prime. If p divides α, write α = pβ. Then
I = (p2 , pβ) = (p)(p, β) is not prime unless (p) = R, which is impossible,
or (p, β) = R, in which case I = (p). Finally, consider the case where p does
not divide α in R. Again, let I1 = (p, α) and observe that I12 ⊆ I. We claim
that I1 ⊆ I, and we prove this by showing that p is not in I. We prove
this by contradiction. Suppose that p is in I. Then p = p2 γ + αδ for some
elements γ and δ of R. But then p = p2 γ + αδ, and then pα = p2 γα + ααδ.
But by assumption α is divisible by p2 , so αα = p2 m for some integer m.
Substituting, we see that pα = p2 γα + p2 mδ, so α = p[γα + mδ] is divisible
by p in R, a contradiction.
Thus, the only possibility for a prime ideal is I = (p, α) with α
divisible by p. If α is divisible by p then I = (p), and we are in case (2).
Thus, we are left with the possibility that I = (p, α1 ) for some element
α1 of R with α1  divisible by p but with α1 not divisible by p, in which
case we are in case (3). 

Proposition 5.52. In the situation of Proposition 5.51,


(1) in case (1), (p) = P P ;

(2) in case (2), (p2 ) = P P ;

(3) in case (3), (p) = P P . Also, in this case, if R has an element α0


of norm p, then P = (α0 ) or P = (α0 ), while if R does not have an
element of norm p, then P is not a principal ideal.

Proof:

(1) and (2) are obvious.



(3) Let α1 = a1 + b1 D, and note that a21 − b21 D ≡ 0 (mod p). Then
= (p2 , pα1 , pα1 , α1 α1 ) contains p[α1 + α1 ] = 2a1 p and
1 )(p, α1 ) √
(p, α√
p[α1 D + α1 D] = 2b1 Dp.

Suppose p is odd. Case (a): a1 is not divisible by p. Then 2a1 is not


divisible by p, and so p is in P P . Case (b): a1 is divisible by p. Then b1 is
not divisible by p. If D is not divisible by p then 2b1 D is not divisible by
p, and so p is in P P . If D is divisible by p, then α1  is divisible by p but
not by p2 , and so p is in P P .

i i

i i
i i

i i


5.5. Prime Ideals in O( D) 161

Now suppose p = 2. Case√(a): D ≡ 2 (mod 4). Then a1 is √ even, so


b1 is odd, and then P = (2, D), in which case P P = (4, 2 D, D) =
(2). Case (b): D ≡ 3 (mod 4). Then a1 and b1 are both odd, P P =
(4, 2α1 , 2α1 , a21 − b21 D) and a21 − b21 D ≡ 2 (mod
√ 4), so P P √ = (2). Case
(c): D ≡ 1 (mod√ 4). Then P P = (4, 2a1 + 2b1 D, 2a1 − 2b1 D, α1 ) =
(4, 4a1 , 2a1 − 2b1 D, α1 ) = (2), as a1 and b1 are half–integers.
If R has an element α0 of norm p, then by Lemma 5.48, α is divisible by
α0 , in which case P = (α0 ), or α is divisible by α0 , in which case P = (α0 ).
Conversely, suppose that P is a principal ideal, P = (α) for some el-
ement α of R. Then P = (α), and so (p) = P P = (αα). But then
p2 = p = α · α = α2 so α = p. Thus, if R does not have an
element of norm p, P cannot be a principal ideal. 

Proposition 5.53. In the situation of Proposition 5.51, for each prime p,

(1) in case (1), there are two ideals of this form if P = P and a unique
ideal of this form if P = P ;

(2) in case (2), there is a unique ideal of this form;

(3) in case (3), there are two ideals of this form if P = P and a unique
ideal of this form if P = P .

Proof:

(1) Suppose α0 is any element of R with α0  = p. Note by Lemma 5.48
that α0 is divisible by (α0 ) or by α0 , and that α0 is divisible by α0 or
by α0 , which readily implies that (α0 ) = (α0 ) or (α0 ) = (α0 ).

(2) This is obvious.

(3) Suppose α1 is any element of R with α1  divisible by p but with α1
not divisible by p. Then,√in the notation of the proof of Lemma 5.48, we
must have α1 = a√ 1 + b1 D with b1 ≡ 0 (mod p) and a1 ≡ kb1 (mod p),
and α1 = a1 + b1 D with b1 ≡ 0 (mod p) and a1 ≡ k  b1 (mod p), with
  

k  ≡ ±k ≡ ±k0 (mod p). Suppose k  ≡ k (mod p). Since p and b1 are


relatively prime, there is an integer m with b1 ≡ b1 m (mod√p). But
then a1 ≡ kb1 ≡ kb1 m ≡ a1 m (mod p), so α1 − mα1 = c + d D √ with

c and d both divisible by p, i.e., α1 − mα1 = pγ for γ = [c + d D]/p
an element of R, and then α1 = pγ + α1 m is in (p, α1 ), and vice versa,
so (p, α1 ) = (p, α1 ). If k  ≡ −k (mod p), then, replacing α1 by α1 , the
same argument shows (p, α1 ) = (p, α1 ). 

i i

i i
i i

i i

162 5. Towards Algebraic Number Theory

Proposition 5.54. In the situation of Proposition 5.51, let eD = 1 if D ≡


1 (mod 4), and let eD = 2 if D ≡ 2 or 3 (mod 4). For any prime p,
(1) if p does not divide eD D, then P = P in cases (1) and (3); and

(2) if p divides eD D, case (2) does not occur, and P = P in cases (1)
and (3).

Proof: First, suppose that p is an odd prime. By adding a “redundant”


generator if necessary (see Remark 5.50(1)), we may √ assume that we are in
case (3) and that P = (p, α1 ). Let α1 = a1 + b1 D. Then P = P if and
only if 2a1 is in P . Let us assume that 2a1 is in P .
Now every element of P has norm divisible by p, so p must divide 2a1 ,
and hence p divides a1 . Then p does not divide b1 , as by assumption p
does not divide α1 , but by assumption p divides α1  = |a21 − b21 D|, so p
divides D.
Conversely, suppose p divides D. Then, by Proposition 5.53, we √ may
choose any suitable
√ element
√ α1 and then P = (p, α 1 ). Choose α1 = D.
Then P = (p, D) = (p, − D) = P .
Next,√ suppose that√ p = 2. If D ≡ 2 (mod 4), then, just as √above,
P = (2,√ D) = (2, − D) = P . If D ≡ 3 (mod 4), then √ P = (2, 1 + D) =
(2, 1 − D) = P . If D ≡ 1 (mod 4), then α1 = a1 + b1 D with a1 and b1
half–integers, so in this case 2a1 is an odd integer, so, by the same argument
as above, 2a1 is not in P , and hence in this case P = P .
Finally, in every case where p divides eD D, we have exhibited an explicit
element α1 not divisible by p but with α1  divisible by p, excluding case
(2). 

We can now get a complete description of the prime ideals in R.



Theorem 5.55. Let R = O( D). Let eD = 1 if D ≡ 1 (mod 4), and let
eD = 2 if D ≡ 2 or 3 (mod 4). The following is a complete listing, without
duplication, of the prime ideals P in R:
(1) For every prime p dividing eD D,
(a) if R has an element α0 of norm p, the ideal P = (α0 ). In this case
P is a principal ideal, P = P , and P 2 = (p);
(b) if R does not have an element of norm p, the ideal P = (p, α1 )
where α1 is not divisible by p but α1  is divisible by p. In this case P
is not a principal ideal, P = P , and P 2 = (p).

(2) For every prime p not dividing eD D,


(a) if R has an element α1 that is not divisible by p but with α1 

i i

i i
i i

i i


5.5. Prime Ideals in O( D) 163

divisible by p, the ideal P = (p, α1 ) and the ideal P = (p, α1 ). In this


case P and P are not principal ideals, P = P , and P P = (p);
(b) if R does not have an element that is not divisible by p but whose
norm is divisible by p, the principal ideal P = (p).

Proof: Assembling Propositions 5.51, 5.52, 5.53, and 5.54 gives us almost
all of this theorem. There is only one thing left to do. Proposition 5.51
shows that every prime ideal must be of the form above. To complete the
proof we must show that every ideal of the above form is indeed a prime
ideal. Thus let P be as in the statement of the theorem, and suppose that
I and J are ideals with P ⊇ IJ. We must show that P ⊇ I or P ⊇ J.
To begin with, we note that every element of P has norm divisible by p.
Also, by Lemma 5.49 we may assume that I = (m, β) and J = (n, γ) with
m dividing β and n dividing γ. Then IJ = (mn, nβ, mγ, βγ). Since
P ⊇ IJ every element of IJ must have norm divisible by p. In particular,
mn = m2 n2 is divisible by p. Thus at least one of m and n is divisible
by p. We shall assume that m is divisible by p. (Otherwise interchange I
and J.) To proceed further, we must break the proof up into several cases.
We number the cases as in the statement of the theorem.
Case (2)(b): Here P = (p). Now p divides m and m divides β, so p
divides β and hence p divides β. Then P = (p) ⊇ I.
Case (1): If p divides β then P ⊇ (p) ⊇ I and we are done, so assume
not. By adding a “redundant” generator if necessary, we may assume that
P = (p, α)√with α not divisible by p but with α divisible by p. Write
α = a + b D. Note that b ≡ 0 (mod p) and define k by a ≡ kb (mod p).
Then,√as in the proof of Proposition 5.53, every element
√ of P is of the form
a +b D with a ≡ kb (mod p). Similarly, β = c+d D with d ≡ 0 (mod p)
and c ≡ kd (mod p) or c ≡ −kd (mod p). In the former case, again as in
the proof of Proposition 5.53, β is in P and so P ⊇ I. In the latter case,
by the same argument, β is in P and so P ⊇ I. But here P = P , so P ⊇ I.
Case (2)(a): In the event that D ≡ 1 (mod 4), we have a preliminary
step. In this event p must be odd, so we √ may replace β and√γ by 2β and
2γ, if necessary, to ensure that β = a + b D and γ = c + d D with a, b,
c, and d integers.
By the argument for Case (1), P ⊇ I except possibly in the situation
that β is in P , so suppose we are in that situation. Then nβ is in P , as
P ⊇ IJ. If n is not divisible by p, that gives β in P , and then P = P ,
which is impossible. Thus, we must have n divisible by p. Now consider the
ideal J = (n, γ). By the same argument as above, P ⊇ J except possibly
in the situation that γ is in P . Thus, the final situation to consider is

i i

i i
i i

i i

164 5. Towards Algebraic Number Theory

where β and γ are both in P . In this situation β and γ are in P . Then,


on the one hand, βγ is in P , as P is an ideal, and, on the other hand, βγ
is in P as P ⊇ IJ. √ Thus δ = βγ − βγ is in P . But direct computation
shows that δ = e + f D with e ≡ 0 (mod p) and f ≡ 4kbd (mod p). Thus
0 ≡ k[4kbd] = [2k]2 bd (mod p). Since b ≡ 0 (mod p) and d ≡ 0 (mod p) we
must have k ≡ 0 (mod p) or 2 ≡ 0 (mod p). But in either of these cases
P = P , which we have ruled out. 

Remark 5.56. In the situation of Theorem 5.55, let p be an odd prime not
dividing D. From the proof of Lemma 5.48, we can see that if D is a
quadratic residue (mod p) then case (2)(a) occurs, while if D is a quadratic
nonresidue (mod p) then case (2)(b) occurs.

Lemma 5.57. In the situation of Theorem 5.55, let D ≡ 1 (mod 4) and let
p = 2. If D ≡ 1 (mod 8) then case (2)(a) occurs, while if D ≡ 5 (mod 8)
then case (2)(b) occurs.

Proof: If α = a+ b D with a and b integers and with α even, then either
a and b are both even or they are both odd. In either event,
√ α is divisible
by 2. Thus, the only possibility for α√ 1 is α1 = a1 + b1 D with a1 and
b1 both half–integers, i.e., α = [a + b D]/2 with a and b odd integers.
Using the fact that c2 ≡ 1 (mod 8) for any odd integer c, we see that in
this case α ≡ [1 − D]/4 (mod 2). Thus, if D ≡ 1 (mod 8), we may choose
α1 to be any element of R of this form, and we are in case 2(a), while if
D ≡ 5 (mod 8), there is no possible choice for α1 and we are in case 2(b).

Remark 5.58. Here is a summary of all the possibilities, together with an


expression of each of the prime ideals in terms of congruence conditions.

(1) D ≡ 2 or 3 (mod 4) and

(a) p is an odd prime dividing D:


√ √
P = (p, D) = {a + b D | a ≡ 0 (mod p)}.

(b) p is an odd prime not dividing D and D is a quadratic residue


(mod p):
√ √
P = (p, k0 + D) = {a + b D | a ≡ k0 b (mod p)},
√ √
P = (p, −k0 + D) = {a + b D | a ≡ −k0 b (mod p)},

where k02 ≡ D (mod p).

i i

i i
i i

i i


5.5. Prime Ideals in O( D) 165

(c) p is an odd prime not dividing D and D is a quadratic nonresidue


(mod p):

P = (p) = {a + b D | a ≡ b ≡ 0 (mod p)}.

(d) D ≡ 2 (mod 4) and p = 2:


√ √
P = (2, D) = {a + b D | a ≡ 0 (mod 2)}.

(e) D ≡ 3 (mod 4) and p = 2:


√ √
P = (p, 1 + D) = {a + b D | a ≡ b (mod 2)}.

(2) D ≡ 1 (mod 4) and

(a) p is an odd prime dividing D:


√ √
P = (p, D) = {[a +b D]/2 | a ≡ b (mod 2) and a ≡ 0 (mod p)}.

(b) p is an odd prime not dividing D and D is a quadratic residue


(mod p):

P = (p, [k0 + D]/2)

= {[a + b D]/2 | a ≡ b (mod 2) and a ≡ k0 b (mod p)},

P = (p, [−k0 + D]/2)

= {[a + b D]/2 | a ≡ b (mod 2) and a ≡ −k0 b (mod p)},

where k02 ≡ D (mod p).


(c) p is an odd prime not dividing D and D is a quadratic nonresidue
(mod p):

P = (p) = {[a +b D]/2 | a ≡ b (mod 2) and a ≡ b ≡ 0 (mod p)}.

(d) D ≡ 1 (mod 8) and p = 2:


√ √
P = (2, [1 + D]/2) = {[a + b D]/2 | a ≡ b (mod 4)},
√ √
P = (2, [−1 + D]/2) = {[a + b D]/2 | a ≡ −b (mod 4)}.

(e) D ≡ 5 (mod 8) and p = 2:



P = (2) = {[a +b D]/2 | a ≡ b ≡ 0 (mod 2) and a ≡ b (mod 4)}.

i i

i i
i i

i i

166 5. Towards Algebraic Number Theory



5.6 Examples of Ideals in O( D)
In √
this section, we will give concrete examples of nonprincipal ideals in
O( D). We will start off by giving classes of examples, and we will finish
off by giving examples for particular values of D. We adopt the
√ standard
notation that h(D) denotes the class number of the field h(Q( D)).

Proposition 5.59. Let R = O( D).

(1) If D ≡ 1 (mod 8), let I = (2, [1 + D]/2).

(2) If D ≡ 2 (mod 4), let I = (2, D).

(3) If D ≡ 3 (mod 4), let I = (2, 1 + D).
If there is no element α of R with α = 2, then I is not a principal ideal.
(4) In cases (2) and (3), I 2 = (2), a principal ideal.

(5) In case (1), suppose that R has an element β, which is not divisible by
2, with β = 4. Then I 2 = (β), a principal ideal.

Proof: We have proved (1), (2), (3), and (4) in Theorem √ 5.55 and Lemma 5.57.
As for (5), in this case we must have β = [a + b D]/2 with a and b odd
integers. Replacing β by β, if necessary, we may assume that a ≡ b (mod 4).
Computation then shows that a ≡ b (mod 8) if D ≡ 1 (mod 16) and that
a ≡ b + 4 (mod 8) if √ D ≡ 9 (mod 16).
Let α1 = [1 + D]/2. Then I 2 = (4, 2α1 , α21 ) = (4, 2α1 − 2α21 , α21 ) =
(4, [1 − D]/2, α21 ) = (4, α21 ). Certainly β divides 4, and a long but routine
computation shows that β divides α21 . Thus I 2 ⊆ (β). Another computation
shows that bβ − α21 = 4c for some integer c, so bβ is in I 2 . But certainly
4β is in I 2 , so β is in I 2 . Thus I 2 ⊇ (β). Hence I 2 = (β). 

Corollary 5.60. Let D ≡ 1, 2, 3, 6, or 7 (mod 8) and let R = O( D). Let
I be the ideal of Proposition 5.59.
(1) If D < 0, D = −1, −2, or − 7, then I is a nonprincipal ideal of R.
Consequently, h(D) > 1.
(2) If D is divisible by a prime congruent to 5 (mod 8), or if D is divis-
ible by a prime congruent to 3 (mod 8) and by a prime congruent to
5 (mod 7), then I is a nonprincipal ideal of R. Consequently, h(D) > 1.
(3) If D ≡ 2, 3, 6, or 7 (mod 8) and D is as in (1) or (2), then h(D) is
even. If D ≡ 1 (mod 8), D is as in (1) or (2), and R has an element
β that is not divisible by 2, with β = 4, then h(D) is even.

i i

i i
i i

i i


5.6. Examples of Ideals in O( D) 167

Proof: It is immediate from Lemma 2.67, Lemma 2.77, √ Theorem 5.55, and
Proposition 5.59 that I is a nonprincipal ideal of O( D), and that, if D
is as in (3), I 2 is
√ a principal ideal. Thus, in this case, [I] is an element of
order 2 of C(Q( D)), and so this group has even order. 
√ √
Example 5.61. If D = −1, then√α = 1 + D is an element √ of O( D) with
α = 2. If D = −2, then √ α = D is an element of O( √ D) with α = 2.
If D = −7, then α = [1 + D]/2 is an element of √ O( D) with α = 2.
√Here are examples of real quadratic fields O( D) having elements α of
O( D) with α = 2:

D = 2: α = 2, √
D = 3: α = 1 + √3,
D = 6: α = 2 + √6,
D = 7: α = 3 + √7,
D = 11: α = 3 + √11,
D = 14: α = 4 + √14,
D = 17: α = [5 + √ 17]/2,
D = 19: α = 13 + 3√19,
D = 22: α = 14 +√3 22,
D = 23: α = 5 + 23,√
D = 31: α = 39 +√ 7 31,
D = 33: α = [5 +√ 33]/2,
D = 34: α = 6 + √34,
D = 38: α = 6 + √38,
D = 41: α = [7 + √ 41]/2,
D = 43: α = 59 + 9 43,

D = 46: α = 156 √+ 23 46,
D = 47: α = 7 + √47,
D = 51: α = 7 + √51,
D = 57: α = [7 + √ 57]/2,
D = 59: α = 23 +√3 59,
D = 62: α = 8 + √62,
D = 66: α = 8 + 66,√
D = 67: √ 67,
α = 221 + 27
D = 71: α = 59 +√ 7 71,
D = 73: α = [9 +√ 73]/2,
D = 79: α = 9 + 79,√
D = 86: α = 102 +√11 86,
D = 89: α = [9 + 89]/2,√
D = 94: α = 1464 +√ 151 94,
D = 97: α = [69 + 7 97]/2.

i i

i i
i i

i i

168 5. Towards Algebraic Number Theory

Thus the values of D in this example are not covered by Corollary 5.60.

Example 5.62. Here are examples of elements β as in Proposition 5.59.


Thus, for these values of D, h(D) is even:

D = 65: β = [7 − √65]/2, ββ = −4;
D = 105: β = [11 − √105]/2, ββ = 4;
D = 185: β = [13 + √ 185]/2, ββ = −4;
D = 265: β = [49 − √3 265]/2, ββ = 4;
D = 273: β = [17 + √273]/2, ββ = 4;
D = 305: β = [17 + √305]/2, ββ = −4;
D = 345: β = [19 − √ 345]/2, ββ = 4;
D = 385: β = [59 + 3 √385]/2, ββ = 4;
D = 465: β = [151 + 7 465]/2, ββ = 4.

Proposition 5.63.

(1) Let p1 and p2 be odd primes. Suppose that p2 ≡ 1 (mod 4) and that p1
is a quadratic nonresidue (mod p2 ).

(a) If D is divisible by p2 and D is a quadratic residue (mod p1 ), then


h(D) > 1.
(b) If D is divisible by p1 p2 , then h(D) is even.

(2) Let p be a prime, p ≡ 3 (mod 4). If D is divisible by p and D ≡


2 (mod 8), then h(D) is even.

(3) Let p1 , p2 , and p3 be primes, all congruent to 3 (mod 4). If D is


divisible by p1 p2 p3 , then h(D) is even.

Proof: We have proved (1)(a) in Theorem 2.82. We recapitulate: since


p2 ≡ 1 (mod 4), −p1 is also a quadratic nonresidue
√ (mod p2 ). Thus, by
congruence considerations (mod p2 ), R = O( D) does not have an element
of norm p1√ . On the other hand, since D is a quadratic residue (mod p1 ),
α1 = k + D is an element of R of norm divisible by p1 , where k 2 ≡
D (mod p). Thus, by Theorem 5.55, I = (p1 , α1 ) is a nonprincipal√ideal of
R. The proof of (1)(b) is almost identical, but here we choose α1 = D and
now I is a nonprincipal ideal with I 2 a principal ideal, by Theorem 5.55.
(2) follows similarly from the proof of Theorem 2.87.
As for (3), let (a/p) be the quadratic residue symbol: (a/p) = 1 if a is
a quadratic residue (mod p) and (a/p) = −1 if a is a quadratic nonresidue
(mod p). We use two facts. First, for p ≡ 3 (mod4), (−a/p) = −(a/p),

i i

i i
i i

i i


5.6. Examples of Ideals in O( D) 169

and second, by the Law of Quadratic Reciprocity, for p ≡ q ≡ 3 (mod4),


(p/q) = −(q/p). Here p and q are primes.
We wish to show that there is some prime √ p = p1 , p2 , or p3 with R not
having an element of norm p. Then I = (p, D) is a nonprincipal ideal with
I 2 a principal ideal, by Theorem 5.55 again. The argument for this breaks
up into several cases, and we do two of them. (The others, which are similar,
we leave for the reader.) (a) Suppose that (p1 /p2 ) = 1 and that (p1 /p3 ) =
−1. Then (−p1 /p2 ) = −1, and by congruence considerations (mod p2 )
and (mod p3 ), R does not have an element of norm p1 . (b) Suppose that
(p1 /p2 ) = 1 and that (p1 /p3 ) = 1. If (p2 /p3 ) = 1, then (p2 /p1 ) = −1 and
(−p2 /p3 ) = −1, and by congruence considerations (mod p1 ) and (mod p3 ),
R does not have an element of norm p2 . If (p2 /p3 ) = −1, then (p3 /p2 ) =
−1 and (−p3 /p1 ) = −1, and by congruence considerations (mod p1 ) and
(mod p2 ), R does not have an element of norm p3 . 

As we shall now see, for D negative we can easily get stronger infor-
mation
√ on the ideal class group, and hence the class number, of the field
Q( D).

Theorem 5.64.
√ Let p1 , . . . , pk be distinct primes and set D = −p1 · · · pk .
Then C(Q( D)) has a subgroup isomorphic to (Z/2Z) k−1
√ . Consequently,
h(D) is divisible by 2 k−1
. If D ≡ 3 (mod 4), then C(Q( D)) has a subgroup
isomorphic to (Z/2Z)k . Consequently, h(D) is divisible by 2k .

Proof: Let S = {p1 , . . . , pk }. The subsets of S form a group G of order 2k


with the group operation ∗ being symmetric difference, i.e., if T1 and T2 are
two subsets of S, then T1 ∗T2 = {p in S | p in T1 or in T2 but not in both}.
Indeed, G is isomorphic to (Z/2Z)k . Let G0 be the subgroup of order 2 of
G given by G0 = {∅, S}. (Here ∅ denotes the empty set, as usual.)
For a subset T of S, let dT be the product of the √ elements of T (with
dT = 1 if T =√∅), and let IT be the ideal IT = (dT , −D). We have a map
π : G → C(Q( D)) given by π(T ) = [IT ]. Note that π is a homomorphism
as [IT1 ][IT2 ] = [IT1 ∗T2 ].
It is straightforward to check that IT is a principal ideal if and only
if T = ∅ or T = S. √ This gives that Ker(π) = G0 , so Im(π), which is
a subgroup of C(Q( D)), is isomorphic to G/G0 , which is isomorphic to
(Z/2Z)k−1 . √
In case D ≡ 3 (mod 4), √ we have the ideal J = (2, 1 + D). [J] is an
element of order 2 in C(Q( D)). [J][IT ] is not a principal ideal √ for any
subset T of S, so [J] is not in Im(π). Thus, in this case C(Q( D)) has a
subgroup isomorphic to (Z/2Z)k . 

i i

i i
i i

i i

170 5. Towards Algebraic Number Theory

We proved Theorem 5.64 by elementary means. This is just the easy


half of the following much deeper theorem of Gauss:

Theorem 5.65 (Gauss). Let D = ±p1 · · · pk for distinct primes p1 , . . . , pk .

(1) If D < 0 and D ≡ 1 or 2 (mod 4), then h(D) is divisible by 2k−1 .

(2) If D < 0 and D ≡ 3 (mod 4), then h(D) is divisible by 2k .

(3) If D > 0 and D ≡ 1 or 2 (mod 4), then h(D) is divisible by 2k−2 .

(4) If D > 0 and D ≡ 3 (mod 4), then h(D) is divisible by 2k−1 .



Theorem 5.64 showed how to find quadratic
√ fields Q( D) with h(D) ar-
bitrarily large. But in these cases C(Q( D)) consisted entirely of elements
of order
√ 2 (plus the identity).
√ We will now show how to find quadratic fields
Q( D) where C(Q( D)) contains elements of order n for any n (and in
particular for n odd).

Theorem 5.66. Let n be an arbitrary integer and let q > 1 be odd. Let
a be a positive integer with a and q relatively prime. Let D be the unique
√defined by b D = a −q , where b is an integer. √
2 2 n
square-free integer Suppose
that q and a + b D do not have a common nonunit factor in O( D) for
j

any proper factor j of n. Let I be the ideal



I = (q, a + b D).

Then√I n = (a + b D), a principal ideal. Furthermore, [I] is an element of
C(Q( D)) of order n.

Proof: We claim that I k = (q k , a + b D) for k = 1, . . . , n. We prove this
by induction. It is certainly true for k = 1. Now assume it is true for some
value k. Then
√ √
I k+1 = II k = (q, a + b D)(q k , a + b D)
√ √ √
= (q k+1 , q(a + b D), q k (a + b D), (a + b D)2 )
√ √
= (q k+1 , q(a + b D), (a + b D)2 ).

We claim I k+1 = J = (q k+1 , a + b D). Clearly I k+1 ⊆ J as each of the
generators of I k+1 is in J. To show the reverse inclusion we must show that
k+1
each of the generators
√ of J is in I√ . This is evident for√ q k+1 , so it remains
√ = [a − √
2 2 2 2
to consider √a + b D. Now√ [a + b D]√ = a + b D + 2ab D b2 D] +
2 n k+1
2b D + 2ab D = q + 2b D[a + b D] is in I , so 2b D[a + b D] is

i i

i i
i i

i i


5.6. Examples of Ideals in O( D) 171
√ √ √ √
in I k+1 , as is√[ D][2b D][a + b√ D] = 2bD[a + b D]. Thus I k+1 contains
both q[a + b D] and 2bD[a + b D]. Since q and 2bD are relatively prime
integers,
√ there are integers x and y with qx + 2bDy = 1, which implies that
(a + b D) is in I k+1 , as required.
√ √ √ √
Now I n = (q n , a + b D) = (a + b D) as q n = ±[a +√b D][a − b D],
so I n is a principal ideal. Thus [I] has order j in C(Q( D)) for some j
dividing n. We want to show j = n.
j
√ factor of n. If I is principal,j
Suppose j < n, in which case j is a proper
then I =√(γj ) for some element γj of O( D). Then γj divides
j
√ both p
and a + b D, so by hypothesis
√ I = O( D). But
γj is a unit, and then √j

then I n = (I j )n/j = O( D), contradicting I n = (a + b D). 

Suppose that a2 < q n in Theorem 5.66. Then D < 0. Note then that
in any particular
√ case we can check the hypothesis in Theorem
√ 5.66 that
q and a + b D do not have a common nonunit factor in O( D) for any
j

proper factor j of n. For any√ such common factor must have norm dividing
q n , and, for D negative, O( D) has only finitely many elements of norm N
for any integer N . But, more interestingly, we can give general conditions
that ensure that this hypothesis holds.

Corollary 5.67. In the situation of Theorem 5.66, assume that a2 < q n .


Let p be the smallest prime factor of n, and set m = n/p. (If n is prime,
note that m = 1.) Suppose that −D > q m − 1 if D ≡ 2 or 3 (mod 4)√ and
that −D > 4q m − 1 if D ≡ 1 (mod 4). Then [I] is an element of C(Q( D))
of order n.

Proof: Let us set √ γn = a √ + b D, so I n = (γn ). Note that γn has norm
|γn γn | = |[a + b D][a − b D]| = q n .
We prove
√ the corollary by contradiction. Suppose that [I] has order j
in C(Q( D)) for some proper factor j √ of n. Then I j is a principal ideal,
n/j
i.e., I = (γj ) for some element γj of O( D). Then I n = (I j )n/j = (γj ).
j

n/j
Thus γn and γj are associates, so they have the same norm. But the
norm is multiplicative,
√ and that implies that γj has norm q j .
Let γj = c + d D. Since a, b, and q are relatively prime, d cannot be
0, and then c cannot be 0 either. Note that |d| ≥ 1 if D ≡ 2 or 3 (mod 4)
and |d| ≥ 1/2 if D ≡ 1 (mod 4), and similarly for |c|. Now γj has norm
q j = c2 − d2 D = c2 + d2 (−D), so we see that q j ≥ 1 − D if D ≡ 2 or
3 (mod 4) and that q j ≥ (1 − D)/4 if D ≡ 1 (mod 4). But, as a little
algebra shows, this contradicts our hypothesis on q and D. 

i i

i i
i i

i i

172 5. Towards Algebraic Number Theory

Example 5.68. Here are some instances of this corollary:


√ √
(1) 12 +26 = 33 so if I = (3, 1+ −26), then [I] has order 3 in C(Q( −26)),
and hence h(−26) is divisible by 3.
√ √
(2) 22 +23 = 33 so if I = (3, 2+ −23), then [I] has order 3 in C(Q( −23)),
and hence h(−23) is divisible by 3.

(3) 52 + 22 · 14 = 34 so if I = (3, 5 + 2 −14), then [I] has order 4 in

C(Q( −14)), and hence h(−14) is divisible by 4.

(4) 142 + 47 = 35 so if I = (3, 14 + −47), then [I] has order 5 in

C(Q( −47)), and hence h(−47) is divisible by 5.

(5) 1162 + 32 · 241 = 56 so if I = (5, 116 + 3 −241), then [I] has order 6

in C(Q( −241)), and hence h(−241) is divisible by 6.

(6) 462 + 71 = 37 so if I = (3, 46 + −71), then [I] has order 7 in

C(Q( −71)), and hence h(−71) is divisible by 7.

(7) 802 + 161 = 38 so if I = (3, 80 + −161), then [I] has order 8 in

C(Q( −161)), and hence h(−161) is divisible by 8.

(8) 1392 + 362 = 39 so if I = (3, 139 + −362), then [I] has order 9 in

C(Q( −362)), and hence h(−362) is divisible by 9.

Remark 5.69.

(1) We saw in Theorem


√ 5.55 that II is a principal ideal for √
every prime
ideal I of O( D), and hence this holds
√ for any ideal of O( D). Thus,
for any ideal I, [I] = [I]−1 in C(Q( D)).

(2) More precisely, let I = (g, α) with α divisible by g but with g and
α having no common integer divisor other than ±1. We saw in Theo-
rem 5.55 that if g = p is a prime, then II = (p), and hence II = (g)
for an arbitrary integer g.

Now we use the work we have done to examine the situation in O( D)
for specific values of D.
√ √
Example 5.70. Throughout this example R = O( D) and K = Q( D).
The assertions here can all be verified by direct computation and we shall
omit the details.

i i

i i
i i

i i


5.6. Examples of Ideals in O( D) 173

(1) D = −5: Here we return to our original example of nonunique factor-


ization. Recall the factorizations into irreducible elements:
√ √
6 = 2 · 3 = [1 + −5][1 − −5].
This corresponds to the factorizations into principal ideals:
√ √
(6) = (2)(3) = (1 + −5)(1 − −5).
But this is not a counterexample to unique factorization into prime
√ √
ideals as the ideals (2), (3), (1 + −5), and (1 − −5) are not prime
√ √
ideals in R. Let I1 = (2, 1 + −5) and I2 = (3, 1 + −5). Note I1 and
I2 are not principal ideals as R does not have any elements of norm 2
or 3. Then,
√ √
(2) = I12 , (3) = I2 I2 , (1 + −5) = I1 I2 , (1 − −5) = I1 I2 ,
and this gives the following factorization of the ideal (6) into a product
of prime ideals:
(6) = I12 I2 I2 .
It is known that h(−5) = 2 and so [I1 ] (or [I2 ] or [I2 ]) is the nontrivial
element of C(K).
(2) D = 6: We have the following factorizations in R:

6 = 2 · 3 = [ 6]2 .
This corresponds to the factorization of principal ideals:

(6) = (2)(3) = ( 6)2 .
While this superficially looks a lot like the example we have just given
for D√= −5, in fact it could not be more different. The elements 2, 3,
and 6 of R are not irreducible. To be precise, we have the following
factorizations into irreducible elements:
√ √ √ √ √ √ √
2 = −[2+ 6][2− 6], 3 = [3+ 6][3− 6], 6 = −[2− 6][3+ 6].
Then the element 6 has the factorization into irreducible elements:
√ √
6 = [2 − 6]2 [3 + 6]2 .
√ √
If we let I1 = (2 + 6) = I1 and I2 = (3 + 6) = I2 , then (2) = I12
and (3) = I22 . We thus obtain the following factorization of the ideal
(6) into a product of prime ideals:
(6) = I12 I22 .

i i

i i
i i

i i

174 5. Towards Algebraic Number Theory

But in this case R is a UFD by Corollary 2.53, and so the above


factorization √
of 6 into irreducible
√ elements is essentially unique. (The

elements√2 + 6 and 2 − 6 are associates, as are the elements 3 + 6
and 3 − 6.) The factorization of (6) into prime ideals just reflects this
factorization of elements. Finally, as R is a UFD, h(6) = 1.

(3) D = −23: We have the following factorizations:


√ √
27 = 33 = [2 + −23][2 − −23].

This suggests that we should consider I = (3, 2 + −23), a prime ideal.
Note that I is a nonprincipal ideal as R has no elements of norm 3.
Then, by Theorem 5.66 (and Remark 5.69),
√ 3 √
II = (3), I 3 = (2 + −23), I = (2 − −23).

Thus, we see that we have the following factorization of the ideal (27)
into a product of prime ideals:
3
(27) = I 3 I .

It is known that h(−23) = 3 and, by Example 5.68(2), [I] is an element


of C(K) of order 3, so [I] is a generator of this group.

(4) D = −26: We have the following factorizations:


√ √
27 = 33 = [1 + −26][1 − −26].

This suggests that we should consider I = (3, 1 + −26), a prime ideal.
Note that I is a nonprincipal ideal as R has no elements of norm 3.
Then, by Theorem 5.66 (and Remark 5.69),
√ 3 √
II = (3), I 3 = (1 + −26), I = (1 − −26).

Thus, we see that we have the following factorization of the ideal (27)
into a product of prime ideals:
3
(27) = I 3 I .

By Example 5.68(1), [I] is an element of C(K) of order 3. Let J =



(2, −26). Then J is not a principal ideal but J 2 = (2) is a principal
ideal, by Theorem 5.55, so [J] is an element of C(K) of order 2. Let

K = IJ = (6, 2 + −26). Then K is an element of C(K) of order 6. It
is known that h(−26) = 6, so [K] is a generator of this group.

i i

i i
i i

i i


5.6. Examples of Ideals in O( D) 175

(5) D = −47: We have the following factorizations:


√ √
243 = 35 = [14 + −47][14 − −47].

This suggests that we should consider I = (3, 14 + −47), a non-

principal prime ideal. Then II = (3) and I 5 = (14 + −47), so
5
(243) = (3)5 = I 5 I . By Example 5.68(6), [I] is an element of C(K)
of order 5. It is known that h(−47) = 5, so [I] is a generator of this
group.
We also have the factorizations
√ √
16807 = 75 = [128 + 3 −47][128 − 3 −47],

leading to the ideal J = (7, 128 + 3 −47), and a similar argument
shows that [J] is an element of C(K) of order 5. Thus, [J] must be a

power of [I]. Calculation shows that I 2 J = (−4 + −47), a principal
ideal, so [J] = [I]−2 = [I]3 in C(K).

(6) D = −239: We have the following factorizations:


√ √
243 = 35 = [2 + −239][2 − −239],

leading to the ideal I = (3, 1 + −235), with [I] an element of order 5
in C(K).
We also have the factorizations
√ √
4913 = 173 = [33 + 4 −239][33 − 4 −239],

leading to the ideal J = (17, 33 + 4 −235), with [J] an element of
order 3 in C(K).

Then K = IJ = (51, 4 + −239) is an element of order 15 in C(K). It
is known that h(−239) = 15, so [K] is a generator of this group.

(7) D = 79: We have the following factorizations:


√ √
−15 = −3 · 5 = [8 − 79][8 + 79],
√ √
leading to the ideals I = (3, 8 − 235) and J = (5, 8 − 235). It can be
shown that R has no elements of norm 3 or 5, from √ which it follows that
3
I and J are not principal
√ ideals. I = (17 + 2 79), a principal ideal,
and IJ = (−8 + 79), a principal ideal. It is known that h(79) = 3,
so [I] is a generator of this group, and [J] = [I]−1 .

i i

i i
i i

i i

176 5. Towards Algebraic Number Theory

(8) D = 235: We have the following factorizations:


√ √
234 = 2 · 32 · 13 = −[1 + 235][1 − 235],
√ √
√ ideals I1 = (3, 1 + 235), I2 = (13, 1 + 235), and
leading to the
J = (2, 1 + 235). Note that I1 , I2 , and J are not principal ideals, as,
since ±3, ±13, and ±2 are quadratic nonresidues (mod 5), R does not
have elements of norm 3, 13, or 2.
Then I1 I1 = (3), I2 I2 = (13), and J 2 = (2). (Recall that J = J.)
√ 2 √
More interestingly, I12 I2 J = (1 + 235) and so I1 I2 J = (1 − 235).
In any event, we have the factorization into prime ideals:
2
(234) = I12 I1 I2 I2 J 2 .
√ √
√ K = I1 J = (6,3 1 + 235). Then K = (34 − 2 235), and 34 −
3
Let
2 235 = 216 = 6 . It can be shown that R does not have an element
of norm 6, from which it follows that K is not a principal ideal, and
hence [K] has order 3 in C(K). J 2 = (2), so [J] has order 2 in C(K).
This implies√that [I1 ] has order 6 in C(K) (and calculation shows that
I16 = (67 + 4 235)). It is known that h(235) = 6, so [I1 ] is a generator
of this group.

(9) D = 401: We have the following factorizations:


√ √
400 = 24 · 52 = −[1 + 401][1 − 401],

leading to the ideal I = (5, 1 + 401). It can be shown that R has no
element of norm
√ 5, from which it follows that I is not a principal ideal.
I 5 = (22 − 3 401), a principal ideal. It is known that h(401) = 5, so
[I] is a generator of this group.

(10) D = 483: We have the following factorizations:



483 = 3 · 7 · 23 = [ 483]2 ,
√ √
√ I = (3, 483) and J = (2, 1 + 483). Then
leading to the ideals
K = IJ = (6, 3 + 483). Congruence considerations (mod 483) show
that R does not have any elements of norm 2, 3, or 6, so I, J, and K
are all nonprincipal ideals. On the other hand, I 2 = (3), J 2 = (2), and
K 2 = (6) are all principal ideals. Thus, {(1), [I], [J], [K]} is a subgroup
of C(K) isomorphic to [Z/2Z]2 . It is known that h(235) = 4, so this
subgroup is in fact all of C(K).

i i

i i
i i

i i


5.6. Examples of Ideals in O( D) 177

Remark 5.71. As we √ have observed, for D negative, it is a finite process


to check whether O( D) has an element of norm n for any particular
value of n. In case D > 0, we sometimes have been able to rule out this
possibility through congruence considerations. The theory of continued
fractions, which we do not develop here—it is long, though elementary—
enables us to decide this for any n with |n| relatively small compared to
D. (We alluded to this in parts (7), (8), and (9) of Example 5.70.)

Remark 5.72. For any particular value of D, h(D) can be computed from
Dirichlet’s class number formula, which we shall not present here. The
nature of this formula, however, is not such that we can draw general con-
clusions about the behavior or properties of h(D). As you might imagine,
the formula is more complicated for positive values of D than for nega-
tive values of D. The class number formula evidently yields an integer for
D < 0, but it is not a priori evident that this number is positive. The class
number formula evidently yields a real number for D > 0, but it is not
even a priori evident that this real number is an integer!

Remark 5.73. Tables of the values of h(D) for all D, positive and negative,
with |D| < 500, can be found in the book Number Theory by Z. I. Bore-
vich and I. R. Shafarevich, Academic Press, New York, 1966. (Note the
following error in the tables: the correct value for h(−485) is 20.)

Remark 5.74. We have the following three conjectures of Gauss: √


Conjecture. There are exactly
√ nine imaginary quadratic fields Q( D)
with h(D) = 1. They are Q( D) for D = −1, −2, −3, −7, −11, −19,
−43, −67, and −163.
Conjecture.
√ For any N , there are only finitely many imaginary quadratic
fields Q( D) with h(D) = N . √
Conjecture. There are infinitely many real quadratic fields Q( D)
with h(D) = 1.
The first two of these conjectures are major twentieth-century theo-
rems. The history of their proof is an interesting one, and we refer the
interested reader to the article “Gauss’ Class Number Problem for Imag-
inary Quadratic Fields,” by D. Goldfeld, Bull. Amer. Math. Soc. 13
(1985), 23–37.
The last of these conjectures is still wide open.
The historically minded reader may wonder about the attribution of
various theorems and conjectures in this section to Gauss. Gauss’s work
appeared in his monumental book Disquisitiones Arithmeticae, published
in 1801, while algebraic number theory had its beginnings in the mid and

i i

i i
i i

i i

178 5. Towards Algebraic Number Theory

late nineteenth century. The explanation is that Gauss investigated binary


quadratic forms. With the advent of algebraic number theory, it was real-
ized that his work could be reinterpreted in the context of the ideal theory
of quadratic fields.

5.7 Behavior of Ideals in Algebraic Number Fields


In this section, we consider “what happens” to prime ideals in Z when we
consider them as ideals in O(K) for some algebraic number field K. We
will make this rather vague statement clear below, but first we need some
preliminaries. We let R = O(K). Also, we let n = degK/Q .
It is easy to check that if I is any fractional ideal of Q, then RI =

{ αj ij | αj in R, ij in I} is a fractional ideal of K. In this situation we
write IK for RI.

Lemma 5.75. Let P be any prime ideal of R. Then P divides (p)K for
exactly one prime number p.

Proof: It is easy to check that for any two integers m1 and m2 , (m1 m2 )K =
(m1 )K (m2 )K . Now consider P . By the proof of Lemma 5.39, we know
that P contains an integer m. Let m have prime factorization m =
pa1 1 pa2 2 · · · pakk . Then P divides (m)K = ((p1 )K )a1 ((p2 )K )a2 · · · ((pk )K )ak .
By the definition of a prime ideal, this implies that P divides (pi )K for
some i. Suppose that P divides (pj )K as well, for some pj = pi . Then pi
is in P and pj is in P , so 1 = pi x + pj y for some integers x and y is in P ,
and then P = R, a contradiction. 

In the situation of Lemma 5.75, we say that P lies over p. Thus we see
that every prime ideal of R lies over some prime number p. Our object
in this section is to investigate the prime ideals lying over an arbitrary
prime number p. To this end, we consider any prime number p. By unique
factorization of ideals in R, we have

(p)K = P1e1 P2e2 · · · Pgeg ,

with each Pi a prime ideal. Note that (p)K  = pn and Pi  divides
(p)K , so we must have Pi  = pfi for some fi . The integer ei is called
the ramification index of Pi and the integer fi is called the residue class
field degree of Pi . (Note that R/Pi is a field as Pi is a prime ideal and hence
a maximal ideal of the Dedekind domain R.) Here is our basic result.

i i

i i
i i

i i

5.7. Behavior of Ideals in Algebraic Number Fields 179

Theorem 5.76. In the above situation,


g
n= ei fi .
i=1

Furthermore, if K is a Galois extension of Q, then all of the ei ’s are


equal, and all of the fi ’s are equal. Let e be the common value of the ei ’s
and f be the common value of the fi ’s. Thus, in the case of a Galois
extension K of Q,
n = ef g.

Proof: By Lemma 5.39,


#((p)K ) = pn
and by the multiplicativity of the norm (Lemma 5.47),
2
#(P1e1 P2e2 · · · Pgeg ) = #(P1 )e1 · #(P2 )e · . . . #(Pg )eg = pf1 e1 pf1 e1 . . . pfg eg .

Comparing the exponents of p yields the first claim of the theorem.


If K is a Galois extension of Q, the Galois group leaves (p)K invariant
and permutes the Pi ’s transitively, so the ei ’s and the fi ’s are all equal,
yielding the second claim of the theorem. 

Example 5.77. Let us examine the possible behavior for a quadratic field
K. Here n = 2 so we have the following possibilities:

(1) g = 1, e1 = 2, f1 = 1,
(2a) g = 2, e1 = e2 = 1, f1 = f2 = 1,
(2b) g = 1, e1 = 1, f1 = 2.

We have numbered the cases as above as they correspond to the cases with
the same number in Theorem 5.55. (In Theorem 5.55 we divided case (1)
into cases (1a) and (1b) depending on whether P was a principal ideal, but
that distinction is not relevant here.) In case (1), p is said to ramify in K.
In case (2a), p is said to split in K. In case (2b), p is said to be inert in K.

Note that in Theorem 5.55, we had a simple numerical criterion for


deciding whether or not we were in case (1). (The distinction between
cases (2a) and (2b) is much more subtle, as we see from Remark 5.58.)
This is in fact true in general. To any algebraic number field K we can
associate an integer ΔK , the discriminant of K. ΔK can be computed
directly from a knowledge of O(K).

i i

i i
i i

i i

180 5. Towards Algebraic Number Theory

Theorem 5.78 (Dedekind). Let K be an algebraic number field and let p be


a prime number. Let
(p)K = P1e1 P2e2 · · · Pgeg

as above. Then ei > 1 for some i if and only if p divides ΔK . (In this
case, we say that Pi is ramified.)

Example 5.79. In the case of a quadratic field K = Q( D),

ΔK = D if D ≡ 1 (mod 4),
ΔK = 4D if D ≡ 2 or 3 (mod 4).

We see that the general theory specializes to the result we had in Theo-
rem 5.55, as, in the notation of that theorem, p divides eD D if and only if
p divides ΔK .

In fact, even for quadratic fields the situation can be very intricate. We
close this section by citing the following theorem:

Theorem 5.80. Let m be any integer and let Si , Sr , and Ss be any three
finite disjoint sets of primes. Then there are infinitely many positive values
D, and infinitely many negative values of D, for which, setting K =
of √
Q( D),

(1) h(K) is divisible by m;

(2) all the primes in Si are inert in K;

(3) all the primes in Sr ramify in K;

(4) all the primes in Ss split in K.

5.8 Ideal Elements


One last question about ideals faces us—a linguistic one. Where does
the name “ideal” come from? The answer to this question itself involves
some interesting mathematics. The theory of ideals was first developed by
Kummer in his studies of algebraic number fields. Kummer discovered that
any ideal I in an algebraic number field K becomes a principal ideal upon
passing to a larger algebraic number field K , and he called a generator of
this principal ideal an ideal element of K.

i i

i i
i i

i i

5.8. Ideal Elements 181

Theorem 5.81 (Kummer). Let K be an algebraic number field and let I be


an ideal in R = O(K). Then there is an algebraic number field K ⊇ K
and an element α of R = O(K ) such that

I = (α ) ∩ K.

(In other words, the original ideal I consists of those elements of the
ideal (α ) that are in K.)

(So α is not in general an element of K, but the ideal I consists of the


multiples of α that are in K. Thus the name “ideal element” of K is a
good one for α .)
Proof: We adopt a similar notation to that in the previous section. If J is
any ideal of R, we denote by JK the ideal R J of R .
By Theorem 5.43, C(K) is a finite group. Thus [I] has finite order n,
so I n = (β) for some β in K. We set K = K(α ) where α is a root of the
polynomial X n − β. Then, as ideals in K ,

(α )n = (β)K = (IK )n ,

so, by unique factorization of ideals in K , we have IK = (α ). But then


I = IK ∩ K = (α ) ∩ K. 

Later on, mathematicians realized that the notion of an ideal was an


important one, and have focused on that notion rather than on the notion
of an ideal element. (Indeed, our proof of Theorem 5.81 is certainly not
Kummer’s proof, as our proof uses Minkowski’s result Theorem 5.43, a
later development in algebraic number theory.) But let us see what the
ideal elements of one of the ideals we have considered above are.

Example 5.82. We refer to Example 5.70(1), with K = O( −5). We have
√ √ √ √
the ideals I1 = (2, 1 + −5) = (1 + −5, 1 − −5), I2 = (3, 1 + −5), and

I2 = (3, 1 − −5). √ √
First consider the ideal I1 . Let K be the field K = Q( 2, −5). Then
we have the following factorizations of elements in O(K ):
√ √
√ √ 2 + −10
1 + −5 = 2 ,
√ 2

√ √ 2 − −10
1 − −5 = 2 .
2
It may seem obvious that these are factorizations, but there is something we
need to check: we need to check that all the factors are algebraic integers.

i i

i i
i i

i i

182 5. Towards Algebraic Number Theory

(Recall that an algebraic √ integer is the root of a monic polynomial with


integer coefficients.) Now 2 is certainly √ an algebraic integer,
√ as it√is a root

of the monic polynomial X − 2. But ( 2 + −10)/2 and ( 2 − −10)/2
2

are also algebraic integers as they are both roots of the monic polynomial

√ √
X 4 + 4X 2 + 9. Since 1 +√ −5 and 1 − −5 are both multiples of 2, it
then readily follows that 2 is an ideal generator of I1 .
Next consider the ideals I2 and I2 . Now let K be the field K =
√ √
Q( 3, −5). Then we have the following factorizations of elements in
O(K ):
 2
3 = [ 3]
√ √
√ √ 3 + −15
1 + −5 = 3 ,
√ 3

√ √ 3 − −15
1 − −5 = 3 .
3

Again we need to check that all the factors are algebraic integers. Now 3
√as it√is a root of the monic polynomial X − 3, and
2
is√an algebraic integer,

( 3 + −15)/3 and ( 3 − −15)/3 are also algebraic integers, as they are
both √roots of the monic polynomial X 4 + 8X 2 + 4. It then readily follows
that 3 is an ideal generator of both I2 and I2 .

Remark 5.83. Let us begin with an arbitrary algebraic number field K.


Since C(K) is a finite group, it easily follows that there is a field in which
every ideal of O(K) becomes principal, and in fact a smallest such field
K . But this does not imply that every ideal of O(K ) is principal, as
there may be new ideals of O(K ) that we have to consider. So we may
have to pass to another smallest field K in order that all the ideals of
O(K ) become principal, etc. It is natural to ask whether this procedure
must always eventually stop, and the answer is no. It is known that this
procedure may go on forever, i.e., that we may obtain an infinite sequence
of fields K ⊂ K ⊂ K . . . related in this way.

5.9 Dirichlet’s Unit Theorem


To explain Dirichlet’s Unit Theorem we need to take a more abstract
view of algebraic number fields. We will briefly recapitulate a bit of field
theory. First we define an (abstract) algebraic number field (compare
Definition 5.3).

i i

i i
i i

i i

5.9. Dirichlet’s Unit Theorem 183

Definition 5.84. A field K is an algebraic number field if it is a finite exten-


sion of Q, i.e., K is an extension of Q with degree degK/Q , the dimension
of K as a vector space over Q, finite.
Lemma 5.85. Let K be an algebraic number field. Then every element of
K is algebraic, i.e., is a root of a polynomial in Q[X].
Proof: Exactly the same as the proof of Lemma 5.4. 
In the following lemma, (f (X)) denotes the ideal of F[X] generated by
f (X).
Lemma 5.86. Let F be a field and let f (X) be an irreducible polynomial of
degree n in F[X]. Then E = F[X]/(f (X)) is a field and is an extension of
F of degree n.
Proof: Although part of this theorem follows from general considerations,
we will give a concrete proof of all of it. Let f (X) = an X n + . . . + a0
and let π : F[X] → F[X]/(f (X)) = E be the projection. Set x = π(X).
Then f (x) = an xn + . . . + a0 = π(an X n + . . . + a0 ) = π(f (X)) = 0 in
E = F[X]/(f (X)). Thus x is a root of f (X) in E.
We claim that S = {xn−1 , xn−2 , . . . , x, 1} is a basis for E as a vector
space over F. First, let us see that S spans E: Consider an arbitrary
element y of E. Then y = π(g(X)) for some polynomial g(X). By the
division algorithm for polynomials we may write g(X) uniquely as g(X) =
f (X)q(X) + r(X) where either r(X) = 0 or r(X) is a polynomial of degree
at most n − 1, r(X) = bn−1 X n−1 + . . . + b0 . But then y = π(g(X)) =
g(x) = f (x)q(x) + r(x) = 0 · q(x) + r(x) = r(x) = bn−1 xn−1 + . . . + b0 is in
the span of S. Next, let us see that S is linearly independent: Suppose that
some nontrivial linear combination cn−1 xn−1 + . . . + c0 · 1 of elements of S
is 0. Then x is a root of h(X) = cn−1 X n−1 + . . . + c0 . Then (recalling that
F[X] is a Euclidean domain), x is a root of the nonzero polynomial k(X) =
gcd(f (X), h(X)). Thus, k(X) cannot be a constant polynomial. But k(X)
divides f (X), contradicting the hypothesis that f (X) is irreducible.
Now we claim that E is a field. Let y be an arbitrary nonzero element of
E. Then, as above, y = g(x) for some nonzero polynomial g(X) that is not
divisible by f (X). Since f (X) is assumed irreducible, 1 = gcd(f (X), g(X)),
and then 1 = f (X)s(X) + g(X)t(X) for some polynomials s(X) and t(X).
But then 1 = f (x)s(x)+g(x)t(x) = 0·s(x)+yt(x) = yt(x), so y is invertible
in E. 
Corollary 5.87. Every algebraic number field K can be obtained by a finite
succession of extensions as in Lemma 5.86, beginning with Q.

i i

i i
i i

i i

184 5. Towards Algebraic Number Theory

Proof: Let {α1 , . . . αn } be a basis for K as a vector space over Q. Set


K0 = Q. Now α1 is a root of an irreducible polynomial f1 (X) in K0 [X].
Set K1 = K0 [X]/(f1 [X]). Now α2 is a root of an irreducible polynomial
f2 (X) in K1 [X]. Set K2 = K1 [X]/(f2 [X]). Continue, to obtain Kn = K.

We now consider embeddings of algebraic number fields in C. An embedding


of K in C is an isomorphism ϕ from K to a subfield of C. We say that ϕ is
a real embedding if its image is a subfield of R and a complex embedding
otherwise.

Theorem 5.88. Let K be an algebraic number field with degK/Q = n. Let


r be the number of real embeddings of K in C and let s be the number of
pairs of conjugate complex embeddings of K in C. Then

r + 2s = n.

Proof: It is a theorem that every finite extension of Q can be obtained by a


single application of Lemma 5.86, i.e., K = Q[X]/(f (X)) for an irreducible
polynomial f (X) in Q[X] of degree n. Let x in K be as in the proof of
Lemma 5.86. Then K is spanned by the powers of x, so any embedding ϕ
is determined by ϕ(x). Furthermore, f (x) = 0. Then, for any embedding
ϕ of K, we must have 0 = ϕ(0) = ϕ(f (x)) = f (ϕ(x)). In other words, ϕ(x)
must be a root of f (X) in C. It is easy to check that we may obtain an
embedding by choosing ϕ(x) to be any root of f (X) in C.
It is a theorem that every irreducible polynomial in Q[X] has distinct
roots. The Fundamental Theorem of Algebra tells us that every polynomial
of degree n in C[X], and hence every polynomial of degree n in Q[X], has
n roots in C. Also, as is well known, the complex (i.e., nonreal) roots of
every polynomial in R[X], and hence of every polynomial in Q[X], occur in
conjugate pairs. Assembling these facts, we see that f (X) factors in C[X]
as

f (X) = (X − ρ1 ) . . . (X − ρr )(X − σ1 )(X − σ1 ) . . . (X − σs )(X − σs )

for distinct real numbers ρ1 , . . . , ρr and distinct complex (i.e., nonreal)


numbers σ1 , σ1 , . . . , σs , σs with r + 2s = n. As we have observed, we obtain
an embedding ϕ of K from each choice ϕ(x) = ρi , σi , or σi , and these are
all the embeddings. 

Theorem 5.88 says that, given an “abstract” algebraic number field of de-
gree n, as we have defined it here, there are n ways to identify it with a
“concrete” algebraic number field, as we have previously defined it.

i i

i i
i i

i i

5.9. Dirichlet’s Unit Theorem 185

Example 5.89.

(1) Let D be a square-free√integer, D


√ = 1, and let K = Q[X]/(X 2 − D).
Then X 2 − D = (X√− D)(X + D),√and so we obtain embeddings of
K in C by ϕ(x) = D and ϕ(x) = − D. We thus see:

for K a real quadratic field, i.e., D > 0, r = 2 and s = 0;


for K an imaginary quadratic field, i.e., D < 0, r = 0 and s = 1.

(2) Let K be obtained in two stages: K1 = Q[X]/(X 2 − 2) and K =


K1 [X]/(X 2 − 3). In fact K can be obtained in one√stage√as K =
Q[X]/(X
√ √ − 10X +√
4 2
√ X − 10X√+ 1 =
1). Then 4 2
√ (X − ( 2 + 3))(X −
( 2 − 3))(X − (− 2 + 3))(X √ − (− √2 −√ 3)),√and so√ we √obtain
embeddings
√ √ of K in C by ϕ(x) = 2 + 3, 2 − 3, − 2 + 3, or
− 2 − 3. In this case r = 4 and s = 0.
√ √
(3) Let√K = Q[X]/(X 3 − 2). √ Then X 3 − 2 = (X − 3 2)(X − ω 3 2)(X −
ω 2 3 2) where ω = (−1 + i 3)/2 is a primitive√cube √
root of 1, √
and so
we obtain embeddings of K in C by ϕ(x) = 3 2, ω 3 2, or ω 2 3 2. In
this case r = 1 and s = 1.

Observe that the units of an algebraic number field K form a group


under multiplication. Here is the result to which we have been heading.

Theorem 5.90 (Dirichlet). Let K be an algebraic number field of degree n,


with r real embeddings and s pairs of conjugate complex embeddings. Let
UK be the group of units in K. Then UK is isomorphic to RK × FK , where
RK is the finite cyclic group of roots of 1 in K, and FK is a free abelian
group of rank r + s − 1.

We call a set of generators for FK a fundamental system of units for K.

Example 5.91.

(1) Let K = Q( D) for D < 0 be an imaginary quadratic field. From
Example 5.89(1) we see that r + s − 1 = 0, so that UK is isomorphic to
the finite group RK . For D = −1, RK = {±1, ±i} is a group of order
4. For D = −3, RK = {±1, ±ω, ±ω 2} is a group of order 6. Otherwise,
RK = {±1} is a group of order 2.

(2) Let K = Q( D) for D > 0 be a real quadratic field. Then RK = {±1}
is a group of order 2. From Example 5.89(1) we see that r + s − 1 = 1,
so that FK is isomorphic to Z. Let ε0 be a fundamental unit of K.

i i

i i
i i

i i

186 5. Towards Algebraic Number Theory



If ε0 = a + b D, then a and b are integers, or perhaps half–integers
if D ≡ 1 (mod 4), with a2 − b2 D = ±1, so do not immediately give a
solution of Pell’s equation. But some power εPell of ε0 does, and so we
recover our previous description of the structure of solutions of Pell’s
equation (and in particular the fact that Pell’s equation always has
infinitely many solutions) as a special case of this theorem.
√ √
(3) Let K = Q( 2, 3). Then RK = {±1} is a group of order 2. From
Example 5.89(2) we see that r + s − 1 = 3, so that FK is isomorphic √to
Z . We can easily obtain three √
3
√K as follows: K ⊃ Q( 2)
elements of F
with fundamental
√ unit ε 1 = √1+ 2, K ⊃ Q( 3) with fundamental unit

ε2 = 2 + 3, and K ⊃ Q( 6) with fundamental unit ε3 = 5 + 2 6.
(Note that ε2 and ε3 give solutions to Pell’s equation. ε1 does not, but
ε21 does.) These elements generate a free subgroup F  of FK of rank 3, so
F  must be a subgroup of FK of finite index, but √ a√priori
√ need
√ not
√ be all
of FK . In fact, it is not. It is known that {1+ 2, 2+ 3, ( 2+ 6)/2}
is a fundamental system of units for K, and it is then easy to check
that F  is a subgroup of FK of index 4.

(4) Let K = Q( 3 2). Then RK = {±1} is a group of order 2. From
Example 5.89(3)
√ we
√ see that r + s − 1 = 1, so that FK is isomorphic

to√ Z. 1 + 2√+ ( 2)2 is a fundamental unit of
3 3
√ K (and (1 + 3
2+
+ 3 2) =√ 1). Similarly, K = √
( 3 2)2 )(−1 √ Q(ω 3 2) has fundamental
unit 1 √ + ω 3 2 +√(ω 3 2)2 and K = Q(ω 2 3 2) has fundamental unit
1 + ω 2 + (ω 2 3 2)2 .
2 3

5.10 Exercises

Exercise 5.1. Let R = O( D).

(a) Let I = (α1 , . . . , αk ) with gcd(α1 , . . . , αk ) relatively prime. Show


that I = (1). (Of course, (1) = R.)

(b) More generally, let I = (α1 , . . . , αk ) and suppose there is an element


α of R with α dividing each αi and with α = gcd(α1 , . . . , αk ).
Show that I = (α).

Exercise 5.2. Let R = O( D). Let m1 and m2 be relatively prime inte-
gers and let α be any element of R with α dividing m1 m2 . Show that
(m1 , α)(m2 , α) = (α).

i i

i i
i i

i i

5.10. Exercises 187



Exercise 5.3. In each case, it is known that C(Q(
√ D)) is a cyclic group of
√ order h(D). Find an ideal I of O( D) with [I] a generator of
the given
C(Q( D)).
(a1) D = −103, h(D) = 5 (a2) D = −127, h(D) = 5
(b1) D = −71, h(D) = 7 (b2) D = −151, h(D) = 7
(c1) D = −199, h(D) = 9 (c2) D = −367, h(D) = 9
(d1) D = −74, h(D) = 10 (d2) D = −86, h(D) = 10
(e1) D = −167, h(D) = 11 (e2) D = −271, h(D) = 11
(f1) D = −191, h(D) = 13 (f2) D = −263, h(D) = 13
(g1) D = −101, h(D) = 14 (g2) D = −134, h(D) = 14
(h1) D = −439, h(D) = 15 (h2) D = −751, h(D) = 15
(i1) D = −293, h(D) = 18 (i2) D = −335, h(D) = 18
(j1) D = −743, h(D) = 21 (j2) D = −1931, h(D) = 21
(k1) D = −461, h(D) = 30 (k2) D = −509, h(D) = 30
(l1) D = −1031, h(D) = 35 (l2) D = −2087, h(D) = 35
(m1) D = −794, h(D) = 42 (m2) D = −1046, h(D) = 42
Exercise 5.4. In a Dedekind domain R, we can define the gcd and lcm of
ideals similarly to our definition of the gcd and lcm of elements in Chapter 2:
If I and J are ideals of R, then G = gcd(I, J) if G is an ideal
of R dividing both I and J, and if any other ideal dividing
I and J divides G. Similarly, if I and J are ideals of R, then
L = lcm(I, J) if L is an ideal of R divisible by both I and J, and
if any other ideal divisible by I and J is divisible by L. Also,
two ideals I and J of R are relatively prime if gcd(I, J) = (1).

(a) Show that G = gcd(I, J) and L = lcm(I, J) always exist.

(b) Show that G = I + J = {i + j | i in I, j in J} and that L = I ∩ J.

(c) Show that I and J are relatively prime if and only if they have no
common prime ideal factor. In this case, show that L = IJ.

(d) Express G and L in terms of the factorizations of I and J into prime


ideals.

(e) Show that GL = IJ.

Exercise 5.5. For I an ideal of R, define a ≡ b (mod I) if a − b is an


element of I. Show that the Chinese Remainder Theorem holds for ideals
in a Dedekind domain R: if I and J are relatively prime ideals, then the
simultaneous congruences x ≡ a (mod I) and x ≡ b (mod J) have a solution
for any a and b, and this solution is unique (mod IJ).

i i

i i
i i

i i

188 5. Towards Algebraic Number Theory

Exercise 5.6. Let I and J be ideals in a Dedekind domain R. Show that


R/J is isomorphic to I/IJ.

Exercise 5.7. Use the descriptions of the ideals P in Remark 5.58 to show
that these ideals are maximal and hence prime, thus providing another
proof of Theorem 5.55.

Exercise 5.8. Let K be an algebraic number field with degK/Q = n. Let α


be an algebraic integer in K. It follows from field theory that the degree
d of the polynomial mα (X) divides n. Let this polynomial have constant
term a. The norm α of the element α is defined to be |an/d |. We have
defined the norm of an ideal of O(K) in general in Definition 5.40, and
it is a theorem that for a principal ideal (α), (α) = α. Verify that
with this√definition of the norm of an element, this theorem is true when
K = Q( D).

Exercise 5.9. Let K be an algebraic number field that is an extension of


the algebraic number field K. Let R = O(K) and let R = O(K ).

(a) Let I0 be a fractional ideal of K such that R I0 = R . Show that


I0 = R.

(b) Let I be any ideal of R. Show that R I ∩ R = I.

Exercise 5.10. Let R = Z[X], the ring of polynomials in the variable “X”
with integer coefficients. R is known to be a U F D. Note that parts (a)
and (b) below show that R is not a Dedekind domain.

(a) Show that (2) and (X) are prime ideals of R.

(b) Show that (2, X) is a maximal ideal of R.

(c) Find a pair of ideals I and J such that J ⊃ I but such that there is no
ideal K with I = JK.

(d) Find an ideal of R that cannot be written as a product of prime ideals.


(This is true even though every element of R can be written as a
product of prime elements.)

(e) Let I = (2, X). Show that, for any positive integer n, the minimum
number of elements in a generating set for I n is n + 1. In particular,
no power of I is principal.

i i

i i
i i

i i

5.10. Exercises 189



Exercise 5.11. Let R = {a + b −3 | a and b are integers}. We observed
in Example 5.30(2) that R is not integrally closed in its quotient field.
√ √
Observe that we have the factorizations 4 = [1 + −3][1 − −3] = 2 · 2 in
R .
√ √
(a) Show that (1 + −3), (1 − −3), and (2) are not prime ideals in R .

(b) Show that the ideal (2, 1 + −3) is not invertible.

(c) Find an example of ideals I1 , I2 , and J in R with I1 = I2 but with


I1 J = I2 J.

Exercise 5.12. Let R be the ring of “polynomials in positive fractional pow-


ers of X.” That is, R consists of expressions such as 2+3X 1/4 +5X 2/3 +7X 3
with addition and multiplication given by the usual rules of exponents.

(a) Show that R does not satisfy the ascending chain condition (ACC).

(b) Find a nonzero proper ideal I of R with I 2 = I.

i i

i i
i i

i i

Appendix A

Mathematical Induction

Mathematical induction is an extremely powerful and useful proof tech-


nique. In this appendix we describe it and its variants, and draw some of
its particularly useful consequences.

A.1 Mathematical Induction and Its Equivalents


Mathematical induction is nothing other than a formalized version of
dominoes.
Suppose we have an infinite collection of dominoes, numbered 1, 2, 3,
. . . . Suppose that the first domino falls. Also, suppose that the dominoes
are arranged so that if each domino falls, the next one will also fall. What
will be the result? Clearly, it is that they will all fall.
To formalize this, let F (n) be the proposition

F (n): Domino n falls.

Then we are assuming that

(1) F (1) and

(2) if F (n), then F (n + 1);

and we derive from this the conclusion

F (n) for all positive integers n.

Stated in this way, there is clearly nothing special about F (n), and we
may substitute any proposition. This leads us to the principle of mathe-
matical induction.

191

i i

i i
i i

i i

192 A. Mathematical Induction

Axiom A.1 (The Principle of Mathematical Induction). Let P (n) be any propo-
sition about positive integers. Suppose that

(1) P (1) is true;

(2) for each positive integer n, if P (n) is true, then P (n + 1) is true.

Then P (n) is true for every positive integer n.

Let us emphasize that the point of (2) is that we do not need to prove
P (n), but rather that we can assume P (n) and use it to prove P (n+ 1). (In
a typical proof, at the point we involve the truth of P (n), we often state
“By the inductive hypothesis . . . .”)
As a practical matter, when using mathematical induction to prove a
proposition, it is usually the case that verifying (1) is easy but verifying
(2) takes work. Occasionally it is the case that verifying (2) is easy but
verifying (1) takes work. Rarely it is the case that verifying both (1) and
(2) take work. (And it is virtually never the case that verifying both (1)
and (2) is easy—that would be getting something for nothing.)
There is a variant of mathematical induction called complete induction.
Again, we will first introduce it in the context of dominoes, so again let
us suppose we have an infinite number of dominoes numbered 1, 2, 3, . . . .
Again, let us suppose that the first domino falls. But now let us suppose
that the dominoes are arranged a bit differently, so that it is not necessarily
the case that if domino n falls, then domino n+1 falls, for every n. Suppose
it is instead the case that if dominoes 1 through n all fall, the domino n + 1
falls, for every n. What will happen? Again, all the dominoes will fall.
There is another possibility. Assume the first domino falls, and the
arrangement of the dominoes is that domino n+1 is not necessarily knocked
down by domino n, but by domino k for some k between 1 and n, for every
n. What will happen? Again, all the dominoes will fall. But in this
third situation, we generally do not know which domino k will knock down
domino n + 1. So we handle that lack of information by simply assuming
that we are back in the second case, that is, if dominoes 1 through n all
fall (and hence the mysterious domino k falls, whatever k may happen to
be), domino n + 1 will also fall, for every n.
There are some other possibilities as well, where the first domino falls,
and where domino n + 1 falls if some (perhaps known, perhaps mysterious)
combination of the preceding n dominoes falls. But again, we can handle
this by simply assuming that we are back in the second case, that is, if
dominoes 1 through n all fall, domino n + 1 will also fall, for every n.

i i

i i
i i

i i

A.1. Mathematical Induction and Its Equivalents 193

This second case, translating from dominoes falling to general proposi-


tions, gives us complete induction.

Axiom A.2 (The Principle of Complete Induction). Let P (n) be any propo-
sition about positive integers. Suppose that

(1) P (1) is true;

(2) for each positive integer n, if P (k) is true for all integers k between 1
and n, then P (n + 1) is true.

Then P (n) is true for every positive integer n.

These two variants of induction are logically equivalent. However, in or-


der to prove this it is convenient to introduce a third variant, well-ordering,
which is important in its own right.
On the face of it, well-ordering looks very different than induction, but
it turns out to be equivalent to it.
First, the domino version: Suppose we have an infinite number of domi-
noes numbered 1, 2, 3, . . ., not all of which have fallen down. Then among
the dominoes that are still standing, there is one with the smallest num-
ber. Stated slightly more formally, in this case among the set S consisting
of the dominoes that are still standing, there is a domino that has the
smallest number. Translated into general terms, this gives the statement
of well-ordering.

Axiom A.3 (The Well-Ordering Principle). Any nonempty subset S of the


set of positive integers has a smallest element.

We now show that all three of these variants of induction are logically
equivalent. The proof of this is rather subtle and tricky.

Theorem A.4. The following are equivalent:

(1) the Principle of Mathematical Induction,

(2) the Principle of Complete Induction, and

(3) the Well-Ordering Principle.

Proof: As is common in this situation, we produce a “round-robin” proof,


showing that (1)⇒(2)⇒(3)⇒(1).
(1)⇒(2): We need to be clear about what we must prove. We are
supposing that the hypotheses of complete induction are satisfied, and we

i i

i i
i i

i i

194 A. Mathematical Induction

need to show that we can use the method of mathematical induction to


obtain the conclusion of complete induction.
So suppose the hypotheses of complete induction are satisfied for a
proposition P (n):
(1) P (1) is true.
(2) If P (k) is true for every integer k with 1 ≤ k ≤ n, then P (n + 1) is
true, for every n.
Let us consider a new proposition Q(n). Q(n) is the proposition
Q(n) : P (k) is true for every integer k with 1 ≤ k ≤ n.
Note that P (1) and Q(1) say the same thing, so our assumption that
P (1) is true gives
(1 ) Q(1) is true.
Also the hypothesis of (2) becomes Q(n) is true, and the conclusion
of (2) is that P (n + 1) is true. But we are assuming Q(n) is true, so we
know that P (k) is true for every k with 1 ≤ k ≤ n, and that, together
with the fact that P (n + 1) is true, shows that P (k) is true for every k
with 1 ≤ k ≤ n + 1. But that assertion is Q(n + 1). Thus we see that our
assumption (2) gives
(2 ) For each integer n, Q(n) implies Q(n + 1).
But now we may apply mathematical induction to (1 ) and (2 ) to con-
clude that Q(n) holds for every positive integer n. So for every positive
integer n,
P (k) is true for every integer k with 1 ≤ k ≤ n.
In particular, choosing k = n, we see that
P (n) is true for every integer n,
and that is what we wanted to prove.
(2)⇒(3): Here we are supposing that the hypotheses of well-ordering
are satisfied, and we need to show that we can use the method of complete
induction to obtain the conclusion of well-ordering.
Actually, we will put the precise statement of well-ordering aside for
the moment and deal with a related statement instead. Let S(n) be the
proposition
S(n) : Every subset T of the set of positive integers
that contains the integer n has a smallest element.

i i

i i
i i

i i

A.1. Mathematical Induction and Its Equivalents 195

First we observe that

(1) S(1) is true, for if the set T contains the positive integer 1, then 1 is
certainly the smallest positive integer in T (as 1 is the smallest positive
integer, period).
Next we claim that

(2) if S(k) is true for every integer k with 1 ≤ k ≤ n, then S(n + 1) is


true. Suppose that T contains the positive integer n+ 1. There are two
possibilities: either n + 1 is the smallest positive integer in T , or it is
not. In the first case, T certainly has a smallest element, namely n + 1.
In the second case, T contains some positive integer k with k < n + 1,
i.e., with 1 ≤ k ≤ n. But then, since S(k) is assumed to be true, we
can again conclude that T has a smallest element.
Thus, both hypotheses of complete induction are satisfied and we can
apply complete induction to conclude

S(n) is true for every positive integer n.

Now let us return to well-ordering per se. We are given a nonempty


subset S of the set of positive integers. To say that S is nonempty
means that it contains some positive integer n0 . We have just con-
cluded that S(n) is true for every positive integer n, so in particular it
is true for the integer n = n0 . But S(n0 ) states that every set T that
is a subset of the set of positive integers and that contains n0 has a
smallest element. But our set S is such a set, so we conclude that S
has a smallest element, and that is what we wanted to prove.

(3) ⇒(1): Here we are supposing that the hypotheses of mathematical


induction are satisfied, and we need to show that we can use the well-
ordering principle to obtain the conclusion of mathematical induction.

So suppose that P (n) is a proposition about the positive integer n, and we


have that

(1) P (1) is true, and

(2) if P (n) is true, then P (n + 1) is true, for every n.

We want to conclude that P (n) is true for every n.


Our proof in this case is a proof by contradiction. Suppose it is not the
case that P (n) is true for every n. Then P (n) is false for at least one n.

i i

i i
i i

i i

196 A. Mathematical Induction

Thus if we define the set S by


S = {n | P (n) is false},

then S is a nonempty set. We now apply the well-ordering principle to


conclude that S has a least element n0 . There are two cases: (1 ) n0 = 1
or (2 ) n0 > 1.
Case (1 ): n0 = 1. Since n0 is in S, by the definition of S we have that
P (1) is false. But this directly contradicts hypothesis (1) of mathematical
induction, which states that P (1) is true.
Case (2 ): n0 > 1. Since n0 is in S, by the definition of S we have that
P (n0 ) is false. But in this case n0 − 1 is a positive integer and, since n0
is the smallest positive integer in S, we see that n0 − 1 is not in S. Then,
by the definition of S, we have that P (n0 − 1) is true. But now we can
apply hypothesis (2) of mathematical induction: since P (n0 − 1) is true, so
is P ((n0 − 1) + 1), i.e., P (n0 ) is true, which is again a contradiction.
Thus, our assumption that the conclusion of induction is false leads
(in any case) to a contradiction, so we conclude that the conclusion of
induction is true, that P (n) is true for every positive integer n, and that is
what we wanted to prove. 
Remark A.5. We should observe that we have not proved mathematical
induction. In fact, it is one of the basic properties that define the structure
of the positive integers, and, as we have indicated, we take it as an axiom.

A.2 Consequences of Mathematical Induction


In this section, we draw two of the most often used consequences of math-
ematical induction. Here is the first one:
Theorem A.6. Let a1 > a2 > a3 > . . . be a strictly decreasing sequence of
positive integers. Then this sequence is finite.
Proof: Let S = {a1 , a2 , a3 , . . .} be the set of all the elements of the se-
quence. Then, by the Well-Ordering Principle, S has a smallest element,
which is ak for some k. We claim the sequence stops at ak .
We prove this by contradiction. Suppose not. Then S contains the next
term ak+1 . But ak > ak+1 since the sequence is strictly decreasing. This
contradicts ak being the smallest element of S. Hence this supposition is
false and the sequence stops at ak .
Thus, the sequence has a finite number (to be precise, k, but we don’t
know what k is) of terms. 

i i

i i
i i

i i

A.2. Consequences of Mathematical Induction 197

The second consequence is the Pigeonhole Principle. The Pigeonhole


Principle is an intuitively obvious principle that gets its name from an old
(but not obsolete) technology. Here it is:

Suppose a postal employee is sorting letters into pigeonholes,


and there are more letters than pigeonholes. Then some pi-
geonhole must contain more than one letter.

We have said that the pigeonhole principle is intuitively obvious. But


of course that does not mean that it does not require proof. We state it
precisely and prove it now, as a consequence of mathematical induction.

Theorem A.7 (The Pigeonhole Principle). If m objects are sorted into n cat-
egories, and m > n, then at least one category contains more than one
object.

Proof: We prove this by induction on n. Let P (n) be the statement of the


theorem for a given value of n. We verify the hypotheses of induction:

(1) P (1) is true: If there is only one category, then all m > 1 objects are
in that category.

(2) P (n) implies P (n + 1): Suppose m > n + 1 objects are sorted into n
categories. Pick a category. If that category has k > 1 objects, we
are done. So suppose not. Then it has k = 0 or 1 objects. Consider
the remaining n categories and the remaining m − k objects. If k = 0,
m−k = m > n+1 > n, while if k = 1, m−k = m−1 > (n+1)−1 = n,
so in either event m − k > n and by the inductive hypothesis at least
one of the remaining categories contains more than one object. So in
any case P (n + 1) is true.

Then, by induction, we conclude that P (n) is true for every positive inte-
ger n. 

The pigeonhole principle has a couple of variants, which are proved


similarly.

Theorem A.8. If m objects are sorted into n categories and m < n, then at
least one category is empty.

Proof: We prove this by complete induction on m. Let P (m) be the state-


ment of the theorem for a given value of m. We verify the hypotheses of
complete induction:

i i

i i
i i

i i

198 A. Mathematical Induction

(1) P (1) is true: Pick a category. If that category is empty, we are done.
Otherwise, the single object is in that category. In that case, every
other category is empty.

(2) P (m) implies P (m + 1): Suppose m + 1 < n objects are sorted into n
categories. Pick a category. If that category is empty, we are done. So
suppose not. Then it contains j ≥ 1 objects. Consider the remaining
n − 1 categories and the remaining m = m + 1 − j objects. Since
m + 1 < n, m = m + 1 − j < n − j ≤ n − 1. Then, by the inductive
hypothesis, at least one of the remaining categories is empty. So in any
case P (m + 1) is true.

Thus, by complete induction, we conclude that P (m) is true for every


positive integer m. 

Theorem A.9. Suppose that n objects are sorted into n categories. The
following are equivalent.

(1) Every category contains at most one object.

(2) Every category contains at least one object.

(3) Every category contains exactly one object.

Proof: If n = 1, then the single object is in the single category, and all
three statements are true, and so are equivalent.
Assume henceforth that n ≥ 2. We begin by observing that (3) is
logically equivalent to (1) and (2). Hence (3) implies (1) and (3) implies
(2). If we show that (1) implies (2), that will show that (1) implies (3),
and if we show that (2) implies (1), that will show that (2) implies (3). So
we must prove these two implications.
(1) implies (2): We prove the contrapositive: not-(2) implies not-(1).
Suppose (2) is false and some category is empty. Then the remaining
n − 1 categories contain n objects, so by the original pigeonhole principle
(Theorem A.7) some category must contain more than one object, and (1)
is false.
(2) implies (1): We prove the contrapositive: not-(1) implies not-(2).
Suppose (1) is false and some category contains j > 1 objects. Then the
remaining n − 1 categories contain m = n − j < n − 1 objects, so by the
above variant on the pigeonhole principle (Theorem A.8), some category
must be empty, and (1) is false. (We may have m = 0, but then all of the
other categories are empty and (1) is certainly false.) 

i i

i i
i i

i i

A.3. Exercises 199

A.3 Exercises
Exercise A.1. A geometric progression with first term a and ratio r is a
sequence of the form a, ar, ar2 , ar3 , . . . . Show that, for r = 1, the sum
of the first n terms of this progression is a(rn − 1)/(r − 1), i.e., show that

n−1
rn − 1
ari = a .
i=0
r−1

(Of course, if r = 1, then the progression is constant and the sum of its
first n terms is an.)

Exercise A.2. A decomposition of a positive integer n is a way of writing


n as a sum of positive integers in order. Let d(n) be the number of de-
compositions of n. For example, d(4) = 8 as 4 has the decompositions
4, 3 + 1, 1 + 3, 2 + 2, 2 + 1 + 1, 1 + 2 + 1, 1 + 1 + 2, 1 + 1 + 1 + 1.
Show that, for every positive integer n, d(n) = 2n−1 . (Hint: show that
d(n) = 1 + d(1) + d(2) + . . . + d(n − 1) for n ≥ 2.)

Exercise A.3. Let S = {x0 , x1 , x2 , . . .} be a set of positive integers with


x0 = 1 and xi−1 < xi ≤ 2xi−1 for i ≥ 1.
(a) Show that every positive integer n can be written as a sum of distinct
elements of S.

(b) We might call this a “sub-binary” expansion of n as if xi = 2i for all i


(in which case S = {1, 2, 4, . . .}), this is just the binary expansion of n.
Show that in this case the expansion of n is unique for every positive
integer n, but that in every other case there are positive integers n
whose expansion is not unique.

Exercise A.4.

(a) Let x = m1 + m21 − 1 for some  positive integer m1 . Show that,
for every n ≥ 1, x = mn +√ m2n − 1 for some √
n
positive integer

2
mn . (For example,
√ if x√= 2 + 3, then x = 7 + 4 3 = 7 + 48,
x3 = 26 + 15 3 = 26 + 675, . . ..)


(b) More generally, let x = m1 + m21 − N for some positive integer m1
and some integer N . Show that, for every n ≥ 1, x = √
n
mn + m2n − N n
for some positive integer m
√n . (For example,
√ if x = 2+√ 6, in which
√ case
N = −2, then x2 = 10+4 6 = 10+ 96, x3 = 44+18 6 = 44+ 1944,
. . ..)

i i

i i
i i

i i

200 A. Mathematical Induction

Exercise A.5. A characteristic of degree d is defined to be a sequence of 2d


integers (m1 , . . . , md , n1 , . . . , nd ) with each mi and each ni either 0 or 1. A
characteristic is even or odd according as m1 n1 + . . .+ md nd is even or odd.
Let e(d) be the number of even characteristics of degree d and let o(d) be
the number of odd characteristics of degree d. (For example, e(1) = 3 as
there are 3 even characteristics of degree 1, namely (0, 0), (0, 1), and (1, 0);
while o(1) = 1 as there is 1 odd characteristic of degree 1, namely (1, 1).)

(a) Show that e(d + 1) = 3e(d) + o(d) and that o(d + 1) = 3o(d) + e(d) for
every positive integer d. (Hint: think about extending a characteristic
of degree d − 1 to a characteristic of degree d.)

(b) Show that e(d) = 2d−1 (2d +1) and o(d) = 2d−1 (2d −1) for every positive
integer d.

Exercise A.6. Fix positive integers r, s, and t and define a sequence by

a1 = r, an+1 = (s + 1)an + t for n ≥ 1.

(a) Suppose r = s = t = 1, so the sequence is defined by a1 = 1, an+1 =


2an + 1. Show that an = 2n − 1 for every n.

(b) Suppose r = s = 1, so the sequence is defined by a1 = 1, an+1 = 2an +t.


Show that an = (t + 1)2n−1 − t for every n.

(c) Suppose r = t = 1, so the sequence is defined by a1 = 1, an+1 =


(s + 1)an + 1. Show that an = ((s + 1)n − 1)/s for every n.

(d) Suppose s = t = 1, so the sequence is defined by a1 = r, an+1 = 2an +1.


Show that an = (r + 1)2n−1 − 1 for every n.

(e) Experiment with arbitrary values of r, s, and t and come up with, and
prove, a formula for an in general.

Exercise A.7. The Fibonacci numbers are defined by

f1 = 1, f2 = 1, fn+2 = fn + fn+1 for n ≥ 1.

The first few Fibonacci numbers are given by

n 1 2 3 4 5 6 7 8 9 10
fn 1 1 2 3 5 8 13 21 34 55

(a) Show that f1 + f2 + . . . + fn = fn+2 − 1.

i i

i i
i i

i i

A.3. Exercises 201

(b) Show that f1 + f3 + . . . + f2n−1 = f2n .

(c) Show that f2 + f4 + . . . + f2n = f2n+1 − 1.

(d) Show that (3/2)n−2 < fn < 2n−2 for n ≥ 4.

Exercise A.8. The Fibonacci numbers can be generalized as follows. Choose


a and b nonzero and let

g1 = a, g2 = b, gn+2 = gn + gn+1 for n ≥ 1.

(a) Show that g1 + g2 + . . . + gn = gn+2 − b.

(b) Show that g1 + g3 + . . . + g2n−1 = g2n − (b − a).

(c) Show that g2 + g4 + . . . + g2n = g2n+1 − a.

Exercise A.9. The generalized Fibonacci numbers with a = 1 and b = 3 are


known as the Lucas numbers. The first few Lucas numbers are given by:

n 1 2 3 4 5 6 7 8 9 10
n 1 3 4 7 11 18 29 47 76 123

Observe that 1 = f1 . Show that n = 2fn−1 +fn and that fn = (2/5) n−1 +
(1/5) n for every n ≥ 2.

Exercise A.10. Define sequences {p0 , p1 , p2 , . . .} and {q0 , q1 , q2 , . . .} by

p0 = 1, pn = pn−1 + 2qn−1 for n ≥ 1,


q0 = 1, qn = 2pn−1 + qn−1 for n ≥ 1.

The first few of these are given by

n 0 1 2 3 4 5 6
pn 1 1 3 7 17 41 99
qn 0 1 2 5 12 29 70

(a) Observe that p0 = 1, p1 = 1, q0 = 0, and q1 = 1. Show that pn =


pn−2 + 2pn−1 for n ≥ 2, and that qn = qn−2 + 2qn−1 for n ≥ 2.

(b) Show that pn = qn−1 + qn and that qn = (1/2)pn−1 + (1/2)pn .

(c) Show that p2n − 2qn2 = (−1)n .

i i

i i
i i

i i

202 A. Mathematical Induction

Exercise A.11. Define sequences {p0 , p1 , p2 , . . .} and {q0 , q1 , q2 , . . .} by

p0 = 1, pn = 2pn−1 + 3qn−1 for n ≥ 1,


q0 = 0, qn = pn−1 + 2qn−1 for n ≥ 1.

The first few of these are given by


n 0 1 2 3 4 5 6
pn 1 2 7 26 97 362 1351
qn 0 1 4 15 56 209 780

(a) Observe that p0 = 1, p1 = 2, q0 = 0, and q1 = 1. Show that pn =


−pn−2 + 4pn−1 for n ≥ 2, and that qn = −qn−2 + 4qn−1 for n ≥ 2.

(b) Show that pn = −qn−1 + 2qn and that qn = −(1/3)pn−1 + (2/3)pn .

(c) Show that p2n − 3qn2 = 1.

Exercise A.12. Define sequences {p0 , p1 , p2 , . . .} and {q0 , q1 , q2 , . . .} by

p0 = 1, pn = 2pn−1 + 5qn−1 for n ≥ 1,


q0 = 1, qn = pn−1 + 2qn−1 for n ≥ 1.

The first few of these are given by


n 0 1 2 3 4 5 6
pn 1 2 9 38 161 682 2889
qn 0 1 4 17 72 305 1292

(a) Observe that p0 = 1, p1 = 2, q0 = 0, and q1 = 1. Show that pn =


pn−2 + 4pn−1 for n ≥ 2, and that qn = qn−2 + 4qn−1 for n ≥ 2.

(b) Show that pn = qn−1 + 2qn and that qn = (1/5)pn−1 + (2/5)pn .

(c) Show that p2n − 5qn2 = (−1)n .

Exercise A.13. Define sequences {p0 , p1 , p2 , . . .} and {q0 , q1 , q2 , . . .} by

p0 = 1, pn = 5pn−1 + 12qn−1 for n ≥ 1,


q0 = 1, qn = 2pn−1 + 5qn−1 for n ≥ 1.

The first few of these are given by


n 0 1 2 3 4 5 6
pn 1 5 49 485 4801 47525 470449
qn 0 2 20 198 1960 19402 192060

i i

i i
i i

i i

A.3. Exercises 203

(a) Observe that p0 = 1, p1 = 5, q0 = 0, and q1 = 2. Show that pn =


−pn−2 + 10pn−1 for n ≥ 2, and that qn = −qn−2 + 10qn−1 for n ≥ 2.

(b) Show that pn = −(1/2)qn−1 + (5/2)qn and that qn = −(1/12)pn−1 +


(5/12)pn .

(c) Show that p2n − 6qn2 = 1.

Exercise A.14. Fix a rational number D that is not a perfect square. Choose
nonzero rational numbers a and b and set N = a2 − b2 D. Define sequences
{p0 , p1 , p2 , . . .} and {q0 , q1 , q2 , . . .} by

p0 = 1, pn = apn−1 + bDqn−1 for n ≥ 1,


q0 = 0, qn = bpn−1 + aqn−1 for n ≥ 1.

(a) Observe that p0 = 1, p1 = a, q0 = 0, and q1 = b. Show that pn =


−N pn−2 + 2apn−1 for n ≥ 2, and that qn = −N qn−2 + 2aqn−1 for n ≥
2.

(b) Show that pn = (−N qn−1 +aqn )/b and that qn = (−N pn−1 +apn )/(bD).

(c) Show that p2n − Dqn2 = N n .

Exercise A.15. Let D = 5, a = 1/2, b = 1/2, and define sequences {p0 , p1 ,


p2 , . . .} and {q0 , q1 , q2 , . . .} as in Exercise A.14. Show that n = 2pn and
fn = 2qn for every n ≥ 1, where n is the n-th Lucas number and fn is the
n-th Fibonacci number.

Exercise A.16. An Egyptian fraction decomposition is a way of writing


a fraction a/b as a sum of fractions with numerator 1 and distinct de-
nominators. For example, 3/5 has an Egyptian fraction decomposition
3/5 = 1/2 + 1/10 and 3/7 has an Egyptian fraction decomposition 3/7 =
1/3 + 1/11 + 1/231. Show that every fraction a/b with a and b positive
integers and a < b has an Egyptian fraction decomposition. (Hint: if
a = 1 we are done. Otherwise, let c be the largest integer with 1/c < a/b
and let a /b = a/b − 1/c. Show that a /b < 1/c and that a < a.
Then argue by complete induction on a.) Note that an Egyptian frac-
tion decomposition can never be unique. For we have the algebraic iden-
tity x1 = x+1
1 1
+ x(x+1) and we may apply that here to get, for example,
1/2 = 1/2 = 1/3+1/6 = 1/3+1/7+1/42 = 1/3+1/7+1/43+1/1806 = . . . .

Exercise A.17. You may be familiar with magic squares. A magic square
is a square array with the sums of its rows, columns, and diagonals all the

i i

i i
i i

i i

204 A. Mathematical Induction

same. Let us define an anti-magic square to be a square array with the


sums of its rows, columns, and diagonals all different. Show that there does
not exist an n-by-n antimagic square, for any n, all of whose entries are
−1, 0, or 1.

Exercise A.18. Let n be an arbitrary positive integer. Let S = {a1 , . . . , an }


be any set of n distinct integers between 1 and 2n − 2. Show that S has
two elements ai and aj with ai + aj = 2n − 1.

Exercise A.19. Let n be an arbitrary positive integer. Let S = {a1 , . . . , an }


be any set of n + 1 distinct integers between 1 and 2n. Show that S has
two elements ai and aj that are consecutive integers.

i i

i i
i i

i i

Appendix B

Congruences

B.1 The Notion of Congruence


At its core, congruence is just shorthand for a simple relationship between
numbers. But it is so pervasive, and so useful, that it has metamorphosed
into a point of view. Here is the basic definition.

Definition B.1. Let n be a nonzero integer. Two integers x and a are


congruent modulo n, written

x ≡ a (mod n)

if their difference x − a is divisible by n.

Thus, for example, 40 ≡ 12 (mod 7) as 40 − 12 = 28 is divisible by 7,


65 ≡ 0 (mod 13) as 65 − 0 = 65 is divisible by 13, 9 ≡ −2 (mod 11) as
9 − (−2) = 11 is divisible by 11, and 123456789 ≡ 987654321 (mod 9) as
123456789 − 987654321 = −864197532 = 9(−96021948) is divisible by 9.
Let us draw an immediate consequence of this definition.

Lemma B.2. For a fixed a, the integers x satisfying the congruence

x ≡ a (mod n)

are the integers x = a + nk for any integer k.

Proof: On the one hand, if x is of the form x = a + nk, then x − a = nk,


which is certainly divisible by n, so x ≡ a (mod n).
On the other hand, if x ≡ a (mod n), then x − a is divisible by n, so
x − a = nk for some integer k, and hence x = a + nk. 

205

i i

i i
i i

i i

206 B. Congruences

Thus, for example, the integers x with x ≡ 0 (mod 2) are the integers
of the form x = 2k, i.e., the even integers, and the integers x with x ≡
1 (mod 2) are the integers of the form x = 1 + 2k, i.e., the odd integers.
We think of two integers that are congruent modulo n as being equiv-
alent in a certain way, or, technically speaking, that congruence modulo n
is an equivalence relation. That is the content of the next proposition.

Proposition B.3.

(1) For any integer a, a ≡ a (mod n).

(2) For any two integers a and b, if a ≡ b (mod n), then b ≡ a (mod n).

(3) For any three integers a, b, and c, if a ≡ b (mod n) and b ≡ c (mod n),
then a ≡ c (mod n).

Proof:

(1) a − a = 0 is divisible by n, so a ≡ a (mod n).

(2) If a − b ≡ 0 (mod n), then a − b is divisible by n, so a − b = nk for some


k. But then b − a = −nk = n(−k), so b − a is divisible by n, and then
b ≡ a (mod n).

(3) If a ≡ b (mod n), then a − b = nk1 for some k1 . If b ≡ c (mod n), then
b − c = nk2 for some k2 . But then

a − c = (a − b) + (b − c) = nk1 + nk2 = n(k1 + k2 ),

so a − c is divisible by n, and then a ≡ c (mod n). 

Next we shall see that congruence is compatible with three of the


four basic arithmetic operations—addition, subtraction, and multiplica-
tion. The situation with division is more complicated, and we defer it to
the next section. (See Lemma B.10 and Lemma B.13.)

Proposition B.4. Suppose that a1 ≡ b1 (mod n) and a2 ≡ b2 (mod n). Then

a1 + a2 ≡ b1 + b2 (mod n),
a1 − a2 ≡ b1 − b2 (mod n),
a 1 a2 ≡ b 1 b 2 (mod n).

i i

i i
i i

i i

B.1. The Notion of Congruence 207

Proof: Since a1 ≡ b1 (mod n), a1 = b1 + nk1 for some k, by Lemma B.2.


Since a2 ≡ b2 (mod n), a2 = b2 + nk2 for some k2 , also by Lemma B.2.
Then

a1 + a2 = (b1 + nk1 ) + (b2 + nk2 ) = b1 + b2 + n(k1 + k2 ),

so, again by Lemma B.2,

a1 + a2 ≡ b1 + b2 (mod n),

and also

a1 − a2 = (b1 + nk1 ) − (b2 + nk2 ) = b1 − b2 + n(k1 − k2 ),

so, again by Lemma B.2,

a1 − a2 ≡ b1 − b2 (mod n),

and, finally,

a1 a2 = (b1 + nk1 )(b2 + nk2 ) = b1 b2 + n(k1 b2 + k2 b1 + nk1 k2 ),

so, once again by Lemma B.2,

a1 a2 ≡ b1 b2 (mod n). 

Next we have the following result, which states, in fancier language, that
for a positive integer n, the integers are a complete set of representatives of
the congruence classes modulo n. But we state the result in a much more
down-to-earth way. Note, however, that in order to prove this result we
need to use our work in Section 2.1.
But even before we state it, we observe that this encapsulates the usual
result of division. When we divide the integer x by the positive integer n,
we get a quotient (which we do not care about here) and a remainder. We
can always get a remainder between 0 and n − 1, and when we impose this
restriction on the remainder, it is unique.

Theorem B.5. Let n be a positive integer. For any integer x, the congruence

x ≡ a (mod n)

is valid for exactly one of the integers 0, 1, 2, . . ., n − 1.

i i

i i
i i

i i

208 B. Congruences

Proof: First we will show that x ≡ a (mod n) for some such integer a
between 0 and n − 1, and then we will show that only one such integer a
works.
We begin by applying Lemma 2.7, which states that the integers are a
Euclidean domain, and we refer to Definition 2.5 to see what that means.
Doing so, we see that, given x, there are integers k0 and a0 with

x = a0 + nk0 and |a0 | < |n|.

Now there are two possibilities:

(1) a0 ≥ 0. If this is true then we have 0 ≤ a0 < n, so a0 is one of the


integers 0, 1, 2, . . ., n−1. We set a = a0 and x ≡ a (mod n) as required.

(2) a0 < 0. If this is true then we have −n < a0 < 0, so a0 is one of


the integers −(n − 1), −(n − 2), . . . , −1. In this case, x = a0 + nk0 =
(a0 + n) + n(k0 − 1), so x ≡ a0 = n (mod n). We set a = a0 + n, and we
see that x ≡ a (mod n) and furthermore that a is one of the integers 1,
2, . . . , n − 1, so again a is as required.

Thus, we have that the congruence is true for some a between 0 and n − 1.
Now we must show that there is only one such a. We do this by assuming
there is another value a and showing in fact it must just be a.
So suppose

x ≡ a (mod n) and x ≡ a (mod n)

with both a and a among the integers 0, 1, . . ., n − 1. On the one hand,


−(n − 1) ≤ a − a, as a − a is smallest when a is as small as possible and
a is as big as possible, i.e., when a = 0 and a = n − 1. On the other hand,
−(n − 1) ≤ a − a, as a − a is largest when a is as big as possible and
a is as small as possible, i.e., when a = n − 1 and a = 0. Putting these
together, we see that −(n − 1) ≤ a − a ≤ n − 1.
But, by Proposition B.3, the two congruences x ≡ a (mod n) and x ≡
a (mod n) imply a ≡ a (mod n), i.e., that a − a is a multiple of n. But


the only multiple of n between −(n − 1) and (n − 1) is 0, so we see that


a − a = 0, i.e., that a = a, as required. 
Actually, the restriction that n be positive was purely for convenience.
Here is the more general result.

i i

i i
i i

i i

B.1. The Notion of Congruence 209

Corollary B.6. Let n be a nonzero integer. For any integer x, the


congruence
x ≡ a (mod n)
is valid for a exactly one of the integers 0, 1, 2, . . ., |n| − 1.

Proof: If n > 0 this is just Theorem B.5.


Suppose n < 0, and let n = −n. Then n > 0 and we can apply
Theorem B.5 to conclude x ≡ a (mod n ) for a exactly one of 0, 1, . . .,
n − 1, i.e., for a exactly one of 0, 1, . . ., |n| − 1. But x ≡ a (mod n )
means x − a = n k for some k, so x − a = n k = −n(k) = n(−k) and so
x ≡ a (mod n), and vice versa. 

With this in hand we can explore the relationships between congruences


to different bases.

Proposition B.7. Let n1 be any integer dividing n, and set d = n/n1 .

(1) If x ≡ a (mod n), then x ≡ a (mod n1 ).

(2) If x ≡ a (mod n1 ), then x ≡ a + n1 b (mod n) for b exactly one of the


integers 0, 1, 2, . . ., |d| − 1.

Proof:

(1) Since x ≡ a (mod n), we have that, for some k, x − a = nk = (n1 d)k =
n1 (dk), so x ≡ a (mod n1 ).

(2) By Lemma B.2, x ≡ a (mod n1 ) means that x is of the form x = a+n1 k


for some k, and vice versa.

Then certainly x ≡ a + n1 k (mod n), so if we simply claimed x ≡ a +


n1 b (mod n) for some b, we would be done. But we are further claiming
that we may choose b to be one of the integers 0, 1, . . . , |d| − 1, and
furthermore that this choice is unique. Again, we will prove this in two
stages.
First, by Corollary B.6, we have that k ≡ b (mod d) for some value of b
between 0 and |d| − 1. Then k = b + dj for some j, so

x = a + n1 k = a + n1 (b + dj) = a + n1 b + n1 dj = a + n1 b + nj

and hence
x ≡ a + n1 b (mod n).

i i

i i
i i

i i

210 B. Congruences

Second, suppose x ≡ a + n1 b (mod n) and x ≡ a + n1 b (mod n) with both


b and b between 0 and |d| − 1. Then, by Proposition B.3, a + n1 b ≡
a + n1 b (mod n), and further, by Proposition B.4, n1 b ≡ n1 b (mod n).
Thus n1 b − n1 b = nk = n1 dk for some k, so b − b = dk. Thus we see
that b ≡ b (mod d). But from here on the proof is the same as the proof of
uniqueness in Theorem B.5: if b ≡ b (mod d) with both b and b between
0 and |d| − 1, then b = b. 

Proposition B.7 tells us what the solutions to these two congruences


actually are. But it is worthwhile to simply count the number of solutions.
Corollary B.8. Let n1 be any integer dividing n, and set d = n/n1 .
(1) The congruence x ≡ a (mod n) has a unique solution (mod n1 ).

(2) The congruence x ≡ a (mod n1 ) has |d| solutions (mod n).

Proof: Part (1) of this corollary follows immediately from part (1) of Propo-
sition B.7, and part (2) of this corollary follows immediately from part (2)
of Proposition B.7. 

We should be precise here about what we are counting. For example,


if x ≡ 0 (mod 10), then x ≡ 0 (mod 2), so we have a unique solution
(mod 2), but there are an infinite number of integers that are solutions,
i.e., x = 0, ±10, ±20, ±30, . . . . Similarly, if x ≡ 0 (mod 2) then x ≡ 0, 2, 4,
6, or 8 (mod 10), so we have five solutions (mod 10), but again an infinite
number of integer solutions, i.e., x = 0, ±2, ±4, ±6, ±8, ±10, . . . .
Here is another consequence of Proposition B.7.
Corollary B.9. Let n1 be any integer dividing n and consider the system of
simultaneous congruences

x ≡ a1 (mod n1 ),
x ≡ a (mod n).

If a1 ≡ a (mod n1 ), this system has a solution, and this solution is unique


(mod n). If a1 ≡ a (mod n1 ), this system does not have a solution.

Proof: The second congruence forces the solution to be x ≡ a (mod n) so


by Proposition B.7 it has a unique solution (mod n), and we need only see
whether this is also a solution (mod n1 ).
By Proposition B.7(1), if x ≡ a (mod n) then x ≡ a (mod n1 ), so if
a1 ≡ a (mod n1 ), then by Proposition B.3 we also have x ≡ a1 (mod n1 ),
and both congruences are satisfied.

i i

i i
i i

i i

B.2. Linear Congruences 211

Conversely, suppose both congruences are satisfied. Then x ≡ a1 (mod n1 )


and x ≡ a (mod n), and again by Proposition B.7(1), the second of these
congruences implies x ≡ a (mod n1 ), so by Proposition B.3 again we must
have a1 ≡ a (mod n1 ). Then taking the contrapositive gives the second
claim. 
In general, we can ask about congruences with respect to two different
bases n1 and n2 . Corollary B.9 is one extreme, where one of the bases
divides the other. The other extreme is where the two bases are relatively
prime, i.e., have no common factor, a notion that is defined precisely in
Chapter 2. (See Definition 2.15.) Here we have the Chinese Remainder
Theorem, which we state and prove in the next section. Suffice it to say
now that in this case, the two congruences are completely independent of
each other. More precisely, if n1 and n2 are relatively prime, the simulta-
neous congruences x ≡ a1 (mod n1 ) and x ≡ a2 (mod n2 ) always have a
solution, regardless of the values of a1 and a2 , and furthermore this solution
is unique (mod n1 n2 ). (See Lemma B.17, where this is stated for a pair of
congruences, and the Chinese Remainder Theorem itself, Theorem B.18,
where this is stated for any number of congruences.)

B.2 Linear Congruences


In this section we will consider linear congruences, i.e., congruences of the
form
ax + b0 ≡ b1 (mod n).
We note immediately that this congruence is equivalent to
ax ≡ b (mod n)
where b = b1 − b0 , so we will just consider congruences of this form. We will
proceed in three stages. First, we will consider congruences of this form
with a and n relatively prime. Second, we will consider congruences of this
form in general. Third, we will consider systems of congruences, where we
will derive the Chinese Remainder Theorem.
We begin with a key lemma.
Lemma B.10. Suppose that a and n are relatively prime. If
ax1 ≡ ax2 (mod n),
then
x1 ≡ x2 (mod n).

i i

i i
i i

i i

212 B. Congruences

Proof: By definition, ax1 ≡ ax2 (mod n) means that n divides ax1 − ax2 =
a(x1 − x2 ). By assumption, a and n are relatively prime, so we may apply
Euclid’s Lemma (Lemma 2.41) to conclude that n divides x1 − x2 , which
means that x1 ≡ x2 (mod n). 

Theorem B.11. Suppose that a and n are relatively prime. Then for any b,
the congruence
ax ≡ b (mod n)
has a solution, and this solution is unique (mod n).

First Proof: For simplicity, let us first assume that n is positive.


We prove this by contradiction. Assume that there is a value of b
for which the congruence ax ≡ b (mod n) does not have a solution. By
Theorem B.5, there is an integer b0 between 0 and n−1 with b0 ≡ b (mod n),
and then the congruence ax ≡ b0 (mod n) also does not have a solution.
(For any solution of this latter congruence would also be a solution of our
original congruence, by Proposition B.3.)
For each i = 0, 1, . . ., n − 1, let ci be given by

ci ≡ ai (mod n) and 0 ≤ ci ≤ n − 1.

(Note that this indeed uniquely defines ci , by Corollary B.6.)


Let
C = {c0 , c1 , . . . , cn−1 }.
On the one hand, C has a total of n elements. On the other hand, let us
ask how many possible values there are for each ci . A priori, there are n
possible values, namely 0, 1, . . ., n − 1. But by assumption, no ci can be
equal to b0 . So (at least) one of these values is excluded, and so in fact
there are fewer than n possible values for each ci . Hence, by the Pigeonhole
Principle, the different ci ’s cannot all be distinct. Thus, for some i1 = i2 ,
with both between 0 and n − 1, ci1 = ci2 . But this gives

ai1 ≡ ai2 (mod n) with 0 ≤ i1 , i2 ≤ n − 1.

But by Lemma B.10, since a and n are relatively prime by hypothesis,


this gives
i1 ≡ i2 (mod n) with 0 ≤ i1 , i2 ≤ n − 1,
which is impossible by (the last part of the proof of) Theorem B.5.
Hence the congruence ax ≡ b (mod n) has a solution (and that solution
is the value of i for which ci = b0 ).

i i

i i
i i

i i

B.2. Linear Congruences 213

If n is negative, apply the above argument with n replaced by |n| = −n


to conclude that ax ≡ b (mod −n) has a solution. But then ax ≡ b (mod n)
as well (compare Corollary B.6).
Now we must show that the solution is unique (mod n). Suppose we
have solutions ax ≡ b (mod n) and ax ≡ b (mod n). Then ax ≡ ax (mod n)
by Proposition B.4, and then x ≡ x (mod n) by Lemma B.10. 

Second Proof: This proof is more involved than our first proof but has the
virtue of applying to more general situations (to any PID, in the language
of Chapter 2).
We are assuming that a and n are relatively prime, i.e., have a gcd of
1, so from Lemma 2.7, Definition 2.18, and Theorem 2.20, we know that
there are integers a and n with

aa + nn = 1,

so aa − 1 = −nn = n(−n ) is divisible by n, and so

aa ≡ 1 (mod n).

Set x = a b. Then

ax = a(a b) = (aa )b ≡ 1(b) = b (mod n),

so x = a b is a solution of our congruence, and indeed so is any x ≡


a b (mod n), by Proposition B.4.
The proof of uniqueness of the solution (mod n) is the same as in the
first proof. 

Let us rephrase this theorem from another point of view.

Corollary B.12. Suppose that a and n are relatively prime. Then for any
b, the congruence
ax ≡ b (mod n)

is equivalent to the congruence

x ≡ c (mod n)

for some value of c.


Furthermore, c ≡ a b (mod n) where a is such that aa ≡ 1 (mod n).

i i

i i
i i

i i

214 B. Congruences

Proof: By Theorem B.11, the congruence ax ≡ b (mod n) has a solution


x ≡ c (mod n) for a unique value of c (mod n). Thus, the congruence
ax ≡ b (mod n) holds if and only if the congruence x ≡ c (mod n) holds,
i.e., these two congruences are equivalent.
Setting b = 1 in Theorem B.11, we see that the congruence ax ≡
1 (mod n) has a unique solution x ≡ a (mod n). Setting c = a b, we
have
ac ≡ a(a b) ≡ (aa )b ≡ 1b ≡ b (mod n),

i.e., if c = a b then ac ≡ b (mod n), as claimed. 

Now let us ask how to go about solving a congruence ax ≡ b (mod n), with
a and n relatively prime, in practice.
First, let us suppose n is relatively small. Then we can directly apply
Corollary B.12. The congruence ax ≡ b (mod n) is equivalent to x ≡
c (mod n), and so there are n possibilities for the congruence class of c,
namely integers 0, 1, 2, . . ., n − 1, and we may simply proceed by trial
and error. For example, suppose we wish to solve the congruence 7x ≡
2 (mod 10). Then we need only try x = 0, 1, 2, . . ., 9. We see, in order,
that 7 · 0 = 0 ≡ 0 (mod 10); 7 · 1 = 7 ≡ 7 (mod 10); 7 · 2 = 14 ≡ 4 (mod 10);
7 · 3 = 21 ≡ 1 (mod 10); 7 · 4 = 28 ≡ 8 (mod 10); 7 · 5 = 35 ≡ 5 (mod 10);
7 · 6 = 42 ≡ 2 (mod 10). Thus the solution to our congruence is x ≡
6 (mod 10).
This is fine if we want to solve a single congruence ax ≡ b (mod n). But
if we want to be able to solve multiple congruences with the same a and
n (i.e., ax ≡ b1 (mod n), ax ≡ b2 (mod n), ax ≡ b3 (mod n), etc.) there
is a better way to proceed. In the notation of Corollary B.12, we first
find the solution x = a of the congruence ax ≡ 1 (mod n), and then the
solution of the congruence ax ≡ 1 (mod n) is given by x ≡ a b (mod n).
For example, suppose we want to solve the congruences 5x ≡ b (mod 14)
for different values of b. We first solve 5x ≡ 1 (mod 14) by trial and error,
letting x = 0, 1, 2, . . ., 13. We see, in order, that 5 · 0 = 0 ≡ 0 (mod 14);
5 · 1 = 5 ≡ 5 (mod 14); 5 · 2 = 10 ≡ 10 (mod 14); 5 · 3 = 15 ≡ 1 (mod 14).
Thus a = 3. Then the solution to 5x ≡ 2 (mod 14) is x = 3·2 = 6 (mod 14);
the solution to 5x ≡ 3 (mod 14) is x = 3 · 3 = 9 (mod 14); the solution to
5x ≡ 4 (mod 14) is x = 3 · 4 = 12 (mod 14); the solution to 5x ≡ 5 (mod 14)
is x = 3 · 5 = 15 ≡ 1 (mod 14) (this one is obvious); the solution to
5x ≡ 6 (mod 14) is x = 3 · 6 = 18 ≡ 4 (mod 14); etc.
Obviously this method is practical only if n is small. But in fact we
have already derived a method for finding a , which works effectively for

i i

i i
i i

i i

B.2. Linear Congruences 215

any n. This method is Euclid’s algorithm, developed in Section 2.2. For


example, consider the congruence

37x ≡ 93 (mod 143).

In Example 2.25(3) we applied Euclid’s algorithm to obtain

1 = 37(58) + 143(−15),

so if n = 143 and a = 37, then a = 58, and our congruence has the solution

x = 58 · 93 = 5394 = 143 · 37 + 103


≡ 103 (mod 143).

Of course, we live in an age where 143 is a very small number for a com-
puter. But for n large, Euclid’s algorithm is much more efficient than
trial and error, and for n very large it is the only practical method, even
for a computer. Here is an example with larger numbers. Consider the
congruence

876543210x ≡ 555555556 (mod 1123456789).

In Example 2.25(5) we applied Euclid’s algorithm to obtain

1 = 1123456789(356396689) + 876543210(−456790122),

so if n = 1123456789 and a = 876543210, then a ≡ −456790122


(mod 1123456789). We could simply choose a = −456790122, (and this
would work perfectly well), but we would like to find a value of a between 1
and 1123456789, so we choose a = −456790122+1123456789 = 666666667.
Then our congruence has the solution

x = 666666667 · 555555556 = 370370370851851852


= 1123456789 · 392670330 + 481481482
≡ 481481482 (mod 1123456789).

Now we consider congruences ax ≡ b (mod n) with a and n not rela-


tively prime. We begin by observing that in this case the conclusion of
Lemma B.10 is false. For example, 6 ·4 ≡ 6 ·9 (mod 10) but 4 ≡ 9 (mod 10).
Similarly, Theorem B.11 (and hence Corollary B.12) is false. For example,
6x ≡ 7 (mod 10) does not have a solution (as, substituting x = 0, 1, . . ., 9,
we see that none of these work) while the congruence 6x ≡ 8 (mod 10) has

i i

i i
i i

i i

216 B. Congruences

more than one solution (mod 10) (as, substituting x = 0, 1, . . ., 9, we see


that x = 3 and x = 8 both work).
We will proceed in parallel with our previous work. In fact, setting
d = 1 in Lemma B.13, Theorem B.14, and Corollary B.15 will recover
Lemma B.10, Theorem B.11, and Corollary B.12. But, actually, we will
use our previous work to prove these new results.

Lemma B.13. Suppose that a and n have gcd d, and set n1 = n/d. If

ax1 ≡ ax2 (mod n),

then
x1 ≡ x2 (mod n1 ).

Proof: Set a1 = a/d. By definition, ax1 ≡ ax2 (mod n) means that n


divides ax1 −ax2 = a(x1 −x2 ), i.e., that dn1 divides da1 (x1 −x2 ). But then
n1 divides a1 (x1 −x2 ) = a1 x1 −a1 x2 . In other words, a1 x1 ≡ a1 x2 (mod n1 ).
But, by Lemma 2.16, a1 and n1 are relatively prime, so we may apply
Lemma B.10 to conclude that x1 ≡ x2 (mod n1 ). 

Theorem B.14. Let d = gcd(a, n) and set n1 = n/d. Consider the


congruence
ax ≡ b (mod n).

(1) If b is divisible by d then this congruence has a solution, and this solu-
tion is unique (mod n1 ).

(2) If b is not divisible by d then this congruence does not have a solution.

Proof: Let us write a = da1 and n = dn1 . We know that a1 and n1 are
relatively prime by Lemma 2.16.

(1) Suppose that b is divisible by d, and write b = db1 . Then our original
congruence is
da1 x ≡ db1 (mod dn1 ),
i.e., da1 x − db1 is divisible by dn1 , so da1 x − db1 = (dn1 )k for some k.
But then d(a1 x − b1 ) = da1 x − db1 = (dn1 )k = d(n1 k), so a1 x − b1 =
n1 k, and hence
a1 x1 ≡ b1 (mod n1 ).
But a1 and n1 are relatively prime, so we may apply Theorem B.11 to
conclude this congruence has a unique solution (mod n1 ).

i i

i i
i i

i i

B.2. Linear Congruences 217

To be precise, we have shown that any solution of the congruence ax =


b (mod n) must be a solution of the congruence a1 x ≡ b1 (mod n1 ). But
we must also show that any solution of a1 x ≡ b1 (mod n1 ) is indeed
a solution of ax ≡ b (mod n). So suppose that a1 x ≡ b1 (mod n1 ).
Then a1 x − b1 = n1 k for some k. But then ax − b = (da1 )x − db1 =
d(a1 x − b1 ) = d(n1 k) = (dn1 )k = nk, and so ax ≡ b (mod n).

(2) We prove this by proving the contrapositive: if the congruence ax ≡


b (mod n) has a solution, then b is divisible by d. So suppose this
congruence has a solution, and let x0 be any solution. Then ax0 ≡
b (mod n), so ax0 − b = nk for some k. Then b = ax − nk. Now a is
divisible by d and n is divisible by d, so each term on the right-hand
side is divisible by d. Hence b is divisible by d as well. 

Corollary B.15. Let d = gcd(a, n) and consider the congruence

ax ≡ b (mod n).

Suppose that b is divisible by d. Write a = da1 , n = dn1 , and b = db1 .


Then this congruence is equivalent to the congruence

a1 x ≡ b1 (mod n1 ),

which is equivalent to the congruence

x ≡ c1 (mod n1 )

for some value of c1 .


Furthermore, c1 ≡ a1 b1 (mod n1 ) where a1 is such that a1 a1 ≡ 1 (mod n1 ).

Proof: The proof of Theorem B.14 shows that ax ≡ b (mod n) is equivalent


to a1 x ≡ b1 (mod n1 ), and then the rest of the corollary is a restatement
of Corollary B.12 in this case. 

Remark B.16. Note that (in the notation of Corollary B.12 and Corol-
lary B.15) if aa + nn = d, then, dividing by d, we have a1 a + n1 n = 1,
so a1 a ≡ 1 (mod n1 ) and so we may choose a1 = a .

Let us see how to apply this theorem and corollary. For example, con-
sider the congruence
360x ≡ 324 (mod 2268).
In Example 2.25(1), we found that gcd(2268, 360) = 36. Since 324 is divis-
ible by 36 (as 324 = 36 · 9), this congruence has a solution and it is unique

i i

i i
i i

i i

218 B. Congruences

(mod 63) (as 63 = 2268/36). Indeed, this congruence is equivalent to the


congruence
10x ≡ 9 (mod 63).
We also found there that 36 = 360(19) + 2268(−3), from which we see that
1 = 10(19) + 63(−3).
Thus, we find that our original congruence has the solution

x = 19 · 9 = 171 ≡ 45 (mod 63).

If we want to solve for x (mod 2268), we then find that x can be congruent
to any one of the following 36 values (mod 2268):

x= 45 + 63 · 0 ≡ 45 (mod 2268),
x= 45 + 63 · 1 ≡ 108 (mod 2268),
x= 45 + 63 · 2 ≡ 171 (mod 2268),
..
.
x= 45 + 63 · 35 ≡ 2250 (mod 2268).

Our final topic is the Chinese Remainder Theorem, which deals with the
solution of simultaneous linear congruences. The case of two simultaneous
congruences is the crucial one, and we shall do that one separately first to
pave the way for the general situation.

Lemma B.17. Let n1 and n2 be relatively prime, and let b1 and b2 be arbi-
trary. Then the system of simultaneous congruences

x ≡ b1 (mod n1 ),
x ≡ b2 (mod n2 )

has a solution, and this solution is unique (mod n1 n2 ).

Proof: Observe that for any value of y, x = b1 + n1 y is a solution of the


first congruence x ≡ b1 (mod n1 ). We wish to show that we can choose y so
that x is a solution of the second congruence as well. Now this congruence
becomes
b1 + n1 y ≡ b2 (mod n2 )
or, equivalently,
n1 y ≡ b2 − b1 (mod n2 ).
But n1 and n2 are relatively prime, by hypothesis, so we can apply Theo-
rem B.11 to conclude that this congruence indeed has a solution y0 . Thus,
x0 = b1 + n1 y0 is a solution of both congruences.

i i

i i
i i

i i

B.2. Linear Congruences 219

Our pair of simultaneous congruences has a solution x0 , and we claim


that any x0 ≡ x0 (mod n1 n2 ) is also a solution. To see this, note that
x0 ≡ x0 (mod n1 n2 ) means x0 = x0 +n1 n2 k for some k, so x0 −x0 = n1 n2 k.
Thus we see that x0 −x0 is divisible by n1 , i.e., that x0 ≡ x0 ≡ b1 (mod n1 ),
and also that x0 − x0 is divisible by n2 , i.e., that x0 ≡ x0 ≡ b2 (mod n2 ).
Now we must show that our solution x0 is unique (mod n1 n2 ). That is,
we must show that if x0 is any solution of these simultaneous congruences,
then in fact we must have x0 ≡ x0 (mod n1 n2 ). To see this, observe that,
since x0 and x0 are both solutions of the system of congruences, they are
both solutions of each of the individual congruences in the system. Looking
at the first congruence, we see that x0 ≡ b1 (mod n1 ) and x0 ≡ b1 (mod n1 ),
and so we conclude that x0 ≡ x0 (mod n1 ). In other words, x0 − x0 is
divisible by n1 . Looking at the second congruence, and applying exactly
the same logic, we conclude that x0 − x0 is divisible by n2 . Thus x0 − x0 is
divisible by both n1 and n2 . But, by hypothesis, n1 and n2 are relatively
prime. Then we may apply Corollary 2.42 to conclude that x0 − x0 is
divisible by n1 n2 , i.e., that x0 ≡ x0 (mod n1 n2 ). 

Recall now that integers n1 , n2 , . . . , nk are pairwise relatively prime if


every pair of them is relatively prime. With this definition in hand we can
state our theorem.

Theorem B.18 (Chinese Remainder Theorem). Let n1 , n2 , . . . , nk be pairwise


relatively prime and let b1 , b2 , . . . , bk be arbitrary. Then the system of si-
multaneous congruences

x ≡ b1 (mod n1 ),
x ≡ b2 (mod n2 ),
..
.
x ≡ bk (mod nk )

has a solution, and this solution is unique (mod n1 n2 · · · nk ).

Proof: We prove this by induction on k. For k = 1 there is nothing to


prove, and for k = 2 we have proved this in Lemma B.17.
Now suppose the theorem is true for k − 1 simultaneous congruences
and consider a set of k simultaneous congruences as above.
Ignore the first k − 2 congruences for the moment and focus on the last
two:
x ≡ bk−1 (mod nk−1 ),
x ≡ bk (mod nk ).

i i

i i
i i

i i

220 B. Congruences

We can apply Lemma B.17 to conclude that this pair of congruences is


equivalent to the single congruence

x ≡ c (mod nk−1 nk )

for some c. Then, replacing these two original congruences by this new one,
we obtain the equivalent system

x ≡ b1 (mod n1 ),
x ≡ b2 (mod n2 ),
..
.
x ≡ bk−2 (mod nk−2 ),
x≡c (mod nk−1 nk ).

But this is a system of k − 1 simultaneous congruences, so by the in-


ductive hypothesis we know it has a solution, and this solution is unique
(mod n1 n2 . . . nk−2 (nk−1 nk )), i.e., (mod n1 n2 · · · nk ), and by induction we
are done. 

Let us see how to apply this theorem. If the numbers are relatively small,
trial and error works again. For example, consider the pair of simultaneous
congruences
x ≡ 8 (mod 9),
x ≡ 5 (mod 7).
Then we set x = 8 + 9y and we try y = 0, 1, . . . , 6 until we find a value
that works: y = 0, x = 8 ≡ 1 (mod 7); y = 1, x = 17 ≡ 3 (mod 7);
y = 2, x = 26 ≡ 5 (mod 7). Thus this system has the solution

x ≡ 26 (mod 63).

Here is another example. Consider the triple of simultaneous congruences:

x≡2 (mod 5),


x≡8 (mod 9),
x≡5 (mod 7).

We solve the last two congruences first. (Actually, it does not matter what
order we do it in.) We just did that, so we use our answer and note that
the original system is equivalent to

x≡2 (mod 5),


x ≡ 26 (mod 63).

i i

i i
i i

i i

B.2. Linear Congruences 221

We set x = 26 + 63y and try y = 0, . . . , 4. (Here we are reversing the


order of the search to keep the number of possibilities small.) We see that
y = 0, x = 26 ≡ 1 (mod 5); y = 1, x = 89 ≡ 4 (mod 5); y = 2, x = 152 ≡
2 (mod 5). Thus our original system has the solution

x ≡ 152 (mod 315).

If the numbers get larger, we once again have to use Euclid’s algorithm. We
will first give a formula for the solution of two simultaneous congruences,
and then deal with the general case.
Recall that, for n1 and n2 relatively prime, we used Euclid’s algorithm
to find n1 and n2 with n1 n1 + n2 n2 = 1. Then n1 n1 ≡ 1 (mod n2 ) and
n2 n2 ≡ 1 (mod n1 ).

Lemma B.19. Let n1 and n2 be relatively prime and let b1 and b2 be arbi-
trary. Then the pair of simultaneous congruences

x ≡ b1 (mod n1 ),
x ≡ b2 (mod n2 )

has the solution

x ≡ n2 n2 b1 + n1 n1 b2 (mod n1 n2 ),

where n1 n1 ≡ 1 (mod n2 ) and n2 n2 ≡ 1 (mod n1 ).

Proof: We simply have to check that the given value of x satisfies both
congruences. We see that

n2 n2 b1 + n1 n1 b2 ≡ (n2 n2 )b1 ≡ 1(b1 ) ≡ b1 (mod n1 )

and
n2 n2 b1 + n1 n1 b2 ≡ (n1 n1 )b2 ≡ 1(b2 ) ≡ b2 (mod n2 ).
Thus this value of x (mod n1 n2 ) is a solution, and any x ≡ x (mod n1 n2 )
is also a solution. Furthermore, this is the only solution (mod n1 n2 ) as we
have already shown that the solution is unique (mod n1 n2 ). 

Let us revisit one of our examples using this method.


Consider the pair of simultaneous congruences

x≡8 (mod 9),


x≡5 (mod 7).

i i

i i
i i

i i

222 B. Congruences

We apply Euclid’s algorithm to 9 and 7,


9 = 7·1+2
7 = 2·3+1
2 = 1 · 2,
and then solve for the gcd 1:
1 = 7 + 2(−3)
= 7 + (9 + 7(−1))(−3)
= 9 · (−3) + 7 · 4.
Then we have n1 = 9, n1 = −3, n2 = 7, n2 = 4, b1 = 8, and b2 = 5, so

x ≡ 7 · 4 · 8 + 9 · (−3) · 5 = 89 ≡ 26 (mod 63)


as before.
Now let us try a larger example where the use of Euclid’s algorithm is
a necessity. Consider the pair of simultaneous congruences
x ≡ 19 (mod 37),
x ≡ 91 (mod 143).
In Example 2.25(3) we applied Euclid’s algorithm to obtain

1 = 37(58) + 143(−15).
Thus, we have n1 = 37, n1 = 58, n2 = 143, n2 = −15, b1 = 19, and
b2 = 91, so

x ≡ 143(−15)(19) + 37(58)(91) = 154531 ≡ 1092 (mod 5291)


is the solution.
Here is the formula in the general case. Note that we will have to use
Euclid’s algorithm repeatedly to find the numbers m1 , . . . , mk . Also note
that if k = 2, comparing Theorem B.20 to Lemma B.19, we have m1 = n2
and m1 = n2 , and also m2 = n1 and m2 = n1 (so the subscripts are
reversed).
Theorem B.20 (Chinese Remainder Formula). Let n1 , n2 , . . ., nk be pair-
wise relatively prime and let b1 , b2 , . . ., bk be arbitrary. Consider the
system of simultaneous congruences
x ≡ b1 (mod n1 ),
x ≡ b2 (mod n2 ),
..
.
x ≡ bk (mod nk ).

i i

i i
i i

i i

B.3. Quadratic Congruences 223

Set N = n1 n2 · · · nk . For each i, let mi = N/ni and let mi be such that
mi mi ≡ 1 (mod ni ). Then this system has the solution

x ≡ m1 m1 b1 + m2 m2 b2 + · · · + mk mk bk (mod N ).

Proof: We must check that this value of x satisfies all of the congruences.
We will simply check that it satisfies the first one. The others are the same
except for the subscripts.
The key thing to note is that n1 divides m2 = N/n2 = n1 n3 · · · nk and
similarly that n1 divides m3 , . . ., mk . Thus each term m2 m2 b2 , m3 m3 b3 ,
. . ., mk mk bk is divisible by n1 , i.e., is congruent to 0 (mod n1 ). Then we
have
m1 m1 b1 + m2 m2 b2 + · · · + mk mk bk ≡ m1 m1 b1 + 0 + · · · + 0 (mod n1 )
≡ m1 m1 b1 (mod n1 )
≡ (m1 m1 )b1 (mod n1 )
≡ 1(b1 ) (mod n1 )
≡ b1 (mod n1 )
as m1 m1 ≡ 1 (mod n1 ) by the definition of m1 .
Thus, x is indeed a solution and, just as before, any x ≡ x (mod N ) is
also a solution, while by uniqueness (mod N ) this is the only solution. 

Let us return to our system


x≡2 (mod 5),
x≡8 (mod 9),
x≡5 (mod 7)
with this new method. Then n1 = 5, n2 = 9, and n3 = 7, and m1 = 9 · 7 =
63, m2 = 5 · 7 = 35, and m3 = 5 · 9 = 45. Also, b1 = 2, b2 = 8, and b3 = 5.
Applying Euclid’s algorithm to m1 and n1 we find (skipping the details)
1 = 63(2) + 5(−25), so m1 = 2. Applying Euclid’s algorithm to m2 and n2
we find 1 = 35(−1) + 9(4), so m2 = −1. Applying Euclid’s algorithm to
m3 and n3 we find 1 = 45(−2) + 7(13), so m3 = −2. Then

x ≡ 63(2)(2) + 35(−1)(8) + 45(−2)(5) = −478 ≡ 152 (mod 315)

agreeing with our previous answer.

B.3 Quadratic Congruences


In this section we wish to study quadratic congruences modulo a prime.
We begin by counting the number of solutions.

i i

i i
i i

i i

224 B. Congruences

Lemma B.21. Let p be an odd prime. For any a ≡ 0 (mod p), the congru-
ence x2 ≡ a (mod p) either has no solutions or two solutions.

Proof: If this congruence has no solutions, we are done.


Thus suppose it has a solution x0 , i.e., x20 ≡ a (mod p). Then also
(−x0 )2 = x20 ≡ a (mod p). Furthermore, we claim that −x0 ≡ x0 (mod p),
so we have two solutions. To see this claim, note that if −x0 ≡ x0 (mod p)
then 0 ≡ 2x0 (mod p), i.e., p divides 2x0 . Since p is an odd prime, p does
not divide 2, so by Euclid’s Lemma p must divide x0 , i.e., x0 ≡ 0 (mod p).
But then, on the one hand, x20 ≡ a (mod p) by our choice of x0 , and on
the other hand, x20 ≡ 02 = 0 (mod p), so a ≡ 0 (mod p), contradicting our
choice of a.
Now we have found two solutions, so to complete the proof we must
show that there are no other solutions. A moment’s reflection reveals we
can prove this by showing that if y0 is any solution (i.e., if y02 ≡ a (mod p))
then either y0 ≡ x0 (mod p) or y0 ≡ −x0 (mod p). To see this, note we
have the congruences

x20 ≡ a (mod p),


y02 ≡ a (mod p).

Subtracting, we find the congruence

x20 − y02 ≡ 0 (mod p),

i.e., x20 − y02 is divisible by p. But x20 − y02 = (x0 − y0 )(x0 + y0 ), and now
we can apply Euclid’s Lemma: since p divides this product, it must divide
one of the factors, so p divides x0 − y0 , in which case y0 ≡ x0 (mod p), or
p divides x0 + y0 , in which case y0 ≡ −x0 (mod p). 

We now make an important definition.

Definition B.22. Let p be a prime and let a ≡ 0 (mod p). Then a is a


quadratic residue (mod p) if x2 ≡ a (mod p) has a solution, and a is a
quadratic nonresidue (mod p) if x2 ≡ a (mod p) does not have a solution.

Remark B.23. We have excluded a ≡ 0 (mod p) from our definition. But


observe that if p is prime, the congruence x2 ≡ 0 (mod p) has the unique
solution x ≡ 0 (mod p). To see this, we observe that x ≡ 0 (mod p) certainly
is a solution, and it is the only solution, again by Euclid’s Lemma: if
x2 ≡ 0 (mod p), i.e., if p divides x2 = (x)(x), then p divides one of the
factors, so p divides x, i.e., x ≡ 0 (mod p).

i i

i i
i i

i i

B.3. Quadratic Congruences 225

p Quadratic residues (mod p) Quadratic nonresidues (mod p)


3 1 2
5 1, 4 2, 3
7 1, 2, 4 3, 5, 6
11 1, 3, 4, 5, 9 2, 6, 7, 8, 10
13 1, 3, 4, 9, 10, 12 2, 5, 6, 7, 8, 11
17 1, 2, 4, 8, 9, 13, 15, 16 3, 5, 6, 7, 10, 11, 12, 14
19 1, 4, 5, 6, 7, 9, 11, 16, 17 2, 3, 8, 10, 12, 13, 14, 15, 18

Table B.1. Quadratic residues and nonresidues for some small odd primes.

Remark B.24. If p = 2 and a ≡ 1 (mod p), then x2 ≡ 1 (mod 2) certainly


has a solution, namely x = 1. Thus 1 is a quadratic residue (mod 2).

Table B.1 is a table of the first few odd primes and their quadratic
residues and nonresidues. Note in Definition B.22 that whether a is a
quadratic residue (mod p) only depends on the congruence class of a (mod p),
so we only list one representative of each congruence class, and in fact we
list the representative a with 1 ≤ a ≤ p − 1.
Inspection of this table shows that for these values of p there are the
same number of quadratic residues as there are nonresidues, namely (p −
1)/2 of each. Let us prove that this is true in general.

Lemma B.25. Let p be an odd prime. Then among the p − 1 integers


between 1 and p − 1 there are (p − 1)/2 quadratic residues (mod p), and
there are (p − 1)/2 quadratic nonresidues (mod p).

Proof: For i = 1, 2, 3, . . ., (p − 1)/2, define ai by ai ≡ i2 (mod p), 1 ≤


ai ≤ p − 1. (In other words, ai is the remainder when i2 is divided by p.)
By definition, each ai is a quadratic residue (as x2 ≡ ai (mod p) has the
solution x = i). We claim first that {a1 , . . . , a(p−1)/2 } are distinct, showing
that there are at least (p − 1)/2 quadratic residues. We claim next that any
quadratic residue must be one of the ai ’s, showing that these are all the
quadratic residues. Putting these two claims together we see that (p − 1)/2
of the p − 1 possible values of a between 1 and p − 1 are quadratic residues,
so the remaining p − 1 − (p − 1)/2 = (p − 1)/2 values of a must be quadratic
nonresidues.
Let us first prove the first claim: Suppose ai1 = ai2 for some i1 = i2 .
Then, by definition, i21 ≡ ai1 = ai2 ≡ i22 (mod p), i.e., i21 ≡ i22 (mod p). Now
we argue as in the proof of Lemma B.21. This congruence gives i21 − i22 ≡
0 (mod p), so p divides i21 − i22 = (i1 − i2 )(i1 + i2 ), and then by Euclid’s
Lemma p divides i1 −i2 or p divides i1 +i2 . Now i1 and i2 are both between

i i

i i
i i

i i

226 B. Congruences

1 and (p−1)/2, so the smallest i1 −i2 can be is 1−(p−1)/2 = −(p−3)/2 and


the largest i1 −i2 can be is (p−1)/2−1 = (p−3)/2. Since (p−3)/2 < p, the
only way p can divide i1 − i2 is if i1 − i2 = 0, i.e., if i1 = i2 , which we have
ruled out by assumption. On the other hand, the smallest i1 + i2 can be is
1 + 1 = 2, and the largest i1 + i2 can be is (p − 1)/2 + (p − 1)/2 = p − 1, and
p cannot divide any number in this range. Thus we cannot have ai1 = ai2
for i1 = i2 , so {a1 , . . . , a(p−1)/2 } are distinct.
Let us now prove the second claim: Suppose a is a quadratic residue
(mod p). Then by definition a ≡ k 2 (mod p) for some k. Let j ≡ k (mod p),
1 ≤ j ≤ p−1. There are now two possibilities for j: either 1 ≤ j ≤ (p−1)/2
or (p + 1)/2 ≤ j ≤ p − 1 (as we can divide the interval between 1 and p − 1
into these two halves).
Now if 1 ≤ j ≤ (p − 1)/2, then j = i for some value of i in this range,
so a ≡ j 2 = i2 ≡ ai (mod p) by the definition of ai . On the other hand, if
(p + 1)/2 ≤ j ≤ p − 1, then by simple algebra, 1 ≤ p − j ≤ (p − 1)/2 so
i = p − j is in range. Now if i = p − j, then j = p − i, so a ≡ j 2 = (p − i)2 ≡
(−i)2 = i2 ≡ ai (mod p). Thus in either case a is one of the ai ’s. 

Remark B.26. We should observe that the proof of Lemma B.25 shows us
how to find the quadratic residues (mod p), and hence the entries in the left
column of Table B.1: Simply square the integers between 1 and (p − 1)/2,
and find integers between 1 and p − 1 congruent to these squares. (Those
integers will simply be the remainders when these squares are divided by
p.) Then the entries in the right column of this table will be the remaining
integers between 1 and p − 1. For example,
p=5: 12 ≡1 (mod 5), 22 ≡4 (mod 5);
p=7: 12 ≡1 (mod 7), 22 ≡4 (mod 7), 32 ≡ 2 (mod 7);
p = 11 : 12 ≡1 (mod 11), 22 ≡4 (mod 11), 32 ≡ 9 (mod 11),
42 ≡5 (mod 11), 52 ≡3 (mod 11).
There is a second, less obvious pattern that can be found in the table.
Lemma B.27. Let p be an odd prime and let a and b be integers with a ≡
0 (mod p) and b ≡ 0 (mod p). Let c be an integer with c ≡ ab (mod p).
(1) If a and b are both quadratic residues (mod p), then c is a quadratic
residue (mod p).

(2) If a is a quadratic residue (mod p), and b is a quadratic nonresidue


(mod p), then c is a quadratic nonresidue (mod p). Similarly, if a is
a quadratic nonresidue (mod p), and b is a quadratic residue (mod p),
then c is a quadratic nonresidue (mod p).

i i

i i
i i

i i

B.3. Quadratic Congruences 227

(3) If a and b are both quadratic nonresidues (mod p), then c is a quadratic
residue (mod p).
Proof:
(1) Since a is a quadratic residue (mod p), by definition there is an integer
x with x2 ≡ a (mod p). Similarly, there is an integer y with y 2 ≡
b (mod p). Set z = xy. Then
z 2 = (xy)2 = x2 y 2 ≡ ab ≡ c (mod p),
so by definition c is a quadratic residue (mod p).
(2) Suppose a is a quadratic residue (mod p), and let x be an integer with
x2 ≡ a (mod p). We will prove the lemma in this case by contradiction.
So suppose c is a quadratic residue (mod p), and let z be an integer
with z 2 ≡ c (mod p). Since a ≡ 0 (mod p), we see that x ≡ 0 (mod p)
(as p is a prime), and then we know that there is an integer w with
wx ≡ 1 (mod p). Then (wx)2 ≡ 12 ≡ 1 (mod p). But (wx)2 = w2 x2 =
w2 a, so we see that w2 a ≡ 1 (mod p). Now let y = wz. Then
y 2 = (wz)2 = w2 z 2 ≡ w2 c ≡ w2 (ab) ≡ (w2 a)b ≡ (1)(b) ≡ b (mod p),
so we see that b is a quadratic residue (mod p), contradicting our
hypothesis.
(3) As we have seen in Lemma B.25, there are (p − 1)/2 quadratic residues
(mod p) between 1 and p − 1. Call them d1 , d2 , . . . , d(p−1)/2 . For each
i between 1 and (p − 1)/2, let ei be an integer between 1 and p − 1
with ei ≡ adi (mod p). Then no two ei ’s are equal, as if ei1 = ei2 ,
then adi1 ≡ adi2 (mod p), and since p is a prime and a ≡ 0 (mod p),
by Lemma B.10 this implies that di1 ≡ di2 (mod p), and hence that
di1 = di2 (as they are both between 1 and p − 1), contradicting our
choice of the di ’s.
Now by part (2), each ei is a quadratic nonresidue (mod p) (as it is
congruent to the product of a, a quadratic nonresidue (mod p), and
di , a quadratic residue (mod p)). But also, by Lemma B.25, there
are (p − 1)/2 quadratic nonresidues (mod p) between 1 and p − 1,
so e1 , . . . , e(p−1)/2 must be all of them. Since b is assumed to be a
quadratic nonresidue (mod p), it must be congruent to one of the ei ’s.
So let b ≡ ei0 ≡ ad2i0 (mod p). Then, setting z = adi0 , we see that
z 2 = (adi0 )2 = a2 d2i0 = a(ad2i0 ) ≡ aei0 ≡ ab ≡ c (mod p),
so by definition c is a quadratic residue (mod p). 

i i

i i
i i

i i

228 B. Congruences

Corollary B.28. Let p be an odd prime and let a be an integer with a ≡


0 (mod p). Let a be an integer with aa ≡ 1 (mod p). If a is a quadratic
residue (mod p), then a is also a quadratic residue (mod p); while if a
is a quadratic nonresidue (mod p), then a is also a quadratic nonresidue
(mod p).

Proof: This follows directly from Lemma B.27, setting b = a and c = 1


(which is certainly a quadratic residue (mod p)). 

There is a third, far less obvious, pattern that can be found in the table.
We ask when p − 1, or, equivalently, when −1 (as −1 ≡ p − 1 (mod p)), is
a quadratic residue (mod p). Looking at the table, we see that this is true
for p = 5, 13, 17, and 29 and false for p = 3, 7, 11, 19, and 23. We will
be able to generalize this and to prove that it is true for p ≡ 1 (mod 4)
and false for p ≡ 3 (mod 4). This will take considerable work. Along the
way, we will prove two famous theorems, Fermat’s Little Theorem (of great
importance in itself) and Wilson’s theorem (whose main importance is in
precisely the use we shall put it to).

Theorem B.29 (Fermat’s Little Theorem). Let p be an odd prime. For any
a ≡ 0 (mod p),

ap−1 ≡ 1 (mod p).

Proof: Let us first consider F = (p − 1)! = 1 · 2 · 3 · · · (p − 1). As p is


a prime, and p does not divide any of the factors of F , we conclude (by
the contrapositive of Euclid’s lemma) that p does not divide F . We then
conclude that the gcd of p and F is 1, i.e., that p and F are relatively
prime. (The gcd g of p and F divides each of these. Since p is a prime,
and g divides p, we see that g = 1 or g = p. We cannot have g = p, as g
divides F , and we know that p does not.)
Now consider the set {a, 2a, 3a, . . . , (p − 1)a}. We claim that no two
distinct elements of this set can be congruent (mod p). For suppose i1 a ≡
i2 a (mod p) with i1 = i2 . Then, by Lemma B.10, i1 ≡ i2 (mod p) (as a is
relatively prime to p). Since i1 and i2 are both between 1 and p − 1, this
can only happen if i1 = i2 , a contradiction.
Clearly no two distinct elements of {1, 2, . . . , (p − 1)} are congruent
(mod p), and we have just shown that no two distinct elements of {a, 2a, . . . ,
(p − 1)a} are congruent (mod p). But each element in the second set is
congruent to some element in the first set (in particular, to the remainder
when it is divided by p) and these sets both have the same number p − 1

i i

i i
i i

i i

B.3. Quadratic Congruences 229

of elements. Thus, we see that the congruence classes of the elements in


the second set are simply a rearrangement of the elements in the first set.
Hence the product of the elements in the second set is congruent to the
product of the elements in the first set (mod p). Now the product of the
elements in the first set is simply 1 · 2 · 3 · . . . (p − 1) = F , while the product
of the elements in the second set is

a · 2a · 3a · . . . · (p − 1)a = 1 · 2 · 3 · . . . · (p − 1) · a · a · a . . . · a = F ap−1 .

We thus obtain the congruence

F ap−1 ≡ F = F · 1 (mod p).

We began the proof by showing that F and p are relatively prime, so


by Lemma B.10, we conclude that

ap−1 ≡ 1 (mod p)

as claimed. 

Using this theorem, we are halfway to our goal.


Corollary B.30. If p is a prime, p ≡ 3 (mod 4), then −1 is a quadratic
nonresidue (mod p).

Proof: First we observe that p ≡ 3 (mod 4) means p = 4k + 3 for some


integer k, so (p − 1)/2 = 2k + 1 is odd.
Now suppose −1 is a quadratic residue (mod p), i.e., that there is an x
with x2 ≡ −1 (mod p). Raising each side to the (p − 1)/2 power, we have

(x2 )(p−1)/2 ≡ (−1)(p−1)/2 (mod p),

i.e.,
xp−1 ≡ −1 (mod p),
since, as we have just observed, (p − 1)/2 is odd, and −1 raised to an odd
power is −1. But by Fermat’s Little Theorem, regardless of the value of x
(which is certainly ≡ 0 (mod p)),

xp−1 ≡ 1 (mod p).

Putting these two congruences together, we see that

1 ≡ −1 (mod p), i.e., 2 ≡ 0 (mod p),

i.e., p divides 2, which is impossible as p is an odd prime. 

i i

i i
i i

i i

230 B. Congruences

Now we turn to the second half of our goal. In the proof of Theo-
rem B.29, we simply needed to know F ≡
 0 (mod p). Now we will need to
find the exact value of F (mod p).
Theorem B.31 (Wilson’s Theorem). Let p be a prime. Then
(p − 1)! ≡ −1 (mod p).
Proof: First, we note that this is true for p = 2 as 1! = 1 ≡ −1 (mod 2).
Henceforth we assume p is an odd prime. Then p − 1 is even, so there
are an even number p − 1 of integers between 1 and p − 1, i.e., in the set
T = {1, . . . , p − 1}. Let us exclude the two integers 1 and p − 1, to obtain
the set
S = {2, 3, . . . , p − 2},
containing an even number p − 3 of integers.
We claim that for every element z of S there is an element y of S with
y = z, and zy ≡ 1 (mod p).
We already know that for any z in T there is a y in T with zy ≡
1 (mod p) (Theorem B.11). In particular, since S is a subset of T , we know
that for any z in S, there is a y in T with zy ≡ 1 (mod p). We cannot have
y = 1, as then zy ≡ z ≡ 1 (mod p), and we cannot have y = p − 1, as then
zy ≡ −z ≡ 1 (mod p). Thus y is in S. Finally, we cannot have y = z as
then z 2 ≡ 1 (mod p). But we know that x2 ≡ 1 (mod p) can have at most
two solutions (Lemma B.21), and it certainly has the solutions x = ±1, so
it cannot have the solution x = z as well.
Now let us consider the product F0 = 2 · 3 · . . . · (p − 2) of the elements
of S. Each element of S pairs up with another element of S where the
product of the two is congruent to 1 (mod p), and there are (p − 3)/2 such
pairs of elements, so
F0 ≡ (1)(p−3)/2 = 1 (mod p).
But now
F = (p − 1)!
= 1 · 2 · 3 · . . . · (p − 1)
= 1 · (2 · 3 · . . . · (p − 2)) · (p − 1)
= 1 · F0 · (p − 1)
≡ 1 · 1 · (p − 1) (mod p)
≡ −1 (mod p),
as claimed. 

i i

i i
i i

i i

B.3. Quadratic Congruences 231

Remark B.32. Suppose that n is not a prime. Then n is divisible by some


integer a with 1 < a ≤ n − 1. Suppose (n − 1)! ≡ −1 (mod n). Then, by
Proposition B.7, (n − 1)! ≡ −1 (mod a). Thus (n − 1)! is not divisible by a.
On the other hand, by the very definition of (n−1)! = 1·2 . . . (n−1), it has a
as a factor (as a is one of these integers), a contradiction. We thus conclude
that if n is not a prime, then (n − 1)! ≡ −1 (mod n). Putting this together
with Theorem B.31, we obtain the statement (often referred to as Wilson’s
theorem): an integer n > 1 is a prime if and only if (n − 1)! ≡ −1(mod n).
(On the face of it, this is a test for primes: to see if n is prime, compute
(n − 1)!, divide it by n, and see what the remainder is. However, because
(n − 1)! grows so quickly, this is in practice a perfectly useless test.)
Finally, we arrive at the other half of our goal.
Corollary B.33. If p is a prime, p ≡ 1 (mod 4), set E = ((p − 1)/2)!. Then
E 2 ≡ −1 (mod p).
In particular, −1 is a quadratic residue (mod p) in this case.
Proof: First we observe that p ≡ 1 (mod 4) means p = 4k + 1 for some
integer k, so (p − 1)/2 = 2k is even. As we have just seen, (p − 1)! ≡
−1 (mod p). Let us rewrite this:
(p − 1)! ≡ −1 (mod p)
1 · 2 . . . · (p − 1) ≡ −1 (mod p).
Let us pair off the integer i with the integer p − i on the left-hand side of
this congruence, for i = 1, 2, . . ., (p − 1)/2, to obtain

(1 · (p − 1))(2 · (p − 2))(3 · (p − 3)) · · · (((p − 1)/2)(p − (p − 1)/2)) ≡ −1 (mod p)


(1 · −1)(2 · −2)(3 · −3) · · · (((p − 1)/2)(−(p − 1)/2)) ≡ −1 (mod p)
(−12 )(−22 )(−32 ) · · · (−((p − 1)/2)2 ) ≡ −1 (mod p).

Now we observe that there are (p − 1)/2 terms on the left-hand side, so
pulling out the factors of −1 we see that there are (p − 1)/2 of them, and
(−1)(p−1)/2 = 1 since (p − 1)/2 is even, so we see
(−1)(p−1)/2 (12 )(22 )(32 ) · · · (((p − 1)/2)2 ) ≡ −1 (mod p)
(12 )(22 )(32 ) · · · (((p − 1)/2)2 ) ≡ −1 (mod p)
(1 · 2 · 3 . . . (p − 1)/2)2 ≡ −1 (mod p),
as claimed. 

i i

i i
i i

i i

232 B. Congruences

Corollary B.34. Let p be an odd prime and let a ≡ 0 (mod p).

(1) If p ≡ 1 (mod 4), then either a and −a are both quadratic residues
(mod p) or they are both quadratic nonresidues (mod p).

(2) If p ≡ 3 (mod 4), then exactly one of a and −a is a quadratic residue


(mod p) and the other is a quadratic nonresidue (mod p).

Proof: This follows immediately from Lemma B.27, Corollary B.30, and
Corollary B.33. 

We have just decided when −1 is a quadratic residue (mod p) (and of


course 1 always is). We also need to know when 2 and −2 are quadratic
residues (mod p). We answer that question now.

Lemma B.35. Let x be any odd integer. Then x2 ≡ 1 (mod 8).

Proof: We simply compute: If x ≡ 1 (mod 8), then certainly x2 ≡ 1 (mod 8).


If x ≡ 3 (mod 8), x2 ≡ 9 ≡ 1 (mod 8). If x ≡ 5 (mod 8), x2 ≡ 25 ≡
1 (mod 8). Finally, if x ≡ 7 (mod 8), x2 ≡ 49 ≡ 1 (mod 8). 

Lemma B.36.

(1) If p is a prime, p ≡ 3 (mod 8) or p ≡ 5 (mod 8), then 2 is a quadratic


nonresidue (mod p).

(2) If p is a prime, p ≡ 1 (mod 8) or p ≡ 7 (mod 8), then 2 is a quadratic


residue (mod p).

Proof:

(1) We prove this by induction on p. If p = 3, then this is certainly true.


Now suppose p is a prime, p ≡ 3 (mod 8) or p ≡ 5 (mod 8) and suppose
(1) is true for all primes p < p with p ≡ 3 (mod 8) or p ≡ 5 (mod 8).
We proceed by contradiction. Suppose 2 is a quadratic residue (mod p),
and let x be an integer with x2 ≡ 2 (mod p). We can assume that
1 ≤ x ≤ p − 1, and, replacing x by p − x if necessary, that x is odd.
Hence x2 = pq +2 for some odd integer q. By Lemma B.35, for any odd
integer x, x2 ≡ 1 (mod 8), so pq = x2 − 2 ≡ 1 − 2 ≡ −1 ≡ 7 (mod 8).
If p ≡ 3 (mod 8) this forces q ≡ 5 (mod 8), and if p ≡ 5 (mod 8) this
forces q ≡ 3 (mod 8).
So in neither case can we have q ≡ 1 (mod 8) or q ≡ 7 (mod 8).

i i

i i
i i

i i

B.3. Quadratic Congruences 233

Now q has a prime factorization, of course, and if all of the prime factors
of q were congruent to 1 or 7 (mod 8), then q itself would be congruent
to 1 or 7 (mod 8), which is not the case. Hence q must be divisible by
some prime p with p ≡ 3 or 5 (mod 8). Since 1 ≤ x ≤ p − 1, x2 < p,
so q < p, and in particular p < p. But
x2 = pq + 2 = p (pq/p ) + 2 ≡ 2 (mod p )
and so 2 is a quadratic residue (mod p ), contradicting our inductive
hypothesis.
(2a) First we handle the case p ≡ 7 (mod 8). We proceed as in the proof
of part (1) to show the following claim: if p ≡ 5 or 7 (mod 8), then
−2 is a quadratic nonresidue (mod p).
For p = 5 this claim is certainly true. Now suppose p is prime,
p ≡ 5 (mod 8) or p ≡ 7 (mod 8), and suppose the claim is true for all
primes p < p with p ≡ 5 (mod 8) or p ≡ 7 (mod 8).
Again we proceed by contradiction. Suppose −2 is a quadratic residue
(mod p), and let x be an integer with x2 ≡ −2 (mod p). Again we may
assume 1 ≤ x ≤ p − 1 and x is odd, so x2 = pq − 2 with q odd and
(again using Lemma B.35) pq = x2 + 2 ≡ 1 + 2 = 3 (mod 8). If
p ≡ 5 (mod 8) this forces q ≡ 7 (mod 8), and if p ≡ 7 (mod 8) this
forces q ≡ 5 (mod 8). Then the prime factors of q are all less than p,
but not all of them can be congruent to 1 or 3 (mod 8), as if they were
we would have q ≡ 1 or 3 (mod 8). Hence q has a factor p < p with
p ≡ 5 or 7 (mod 8), and x2 ≡ −2 (mod p ), and so −2 is a quadratic
residue (mod p ), contradicting our inductive hypothesis.
Now suppose p ≡ 5 (mod 8). Then p ≡ 1 (mod 4), so by Corol-
lary B.33, −1 is a quadratic residue (mod p). Then we can apply
Lemma B.27 to conclude that 2 = (−2)(−1) is a quadratic nonresidue
(mod p).
(2b) Now we handle the case p ≡ 1 (mod 8). So as not to interrupt the
flow of the argument, we use a result that we shall prove below.
We claim that the congruence x4 + 1 ≡ 0 (mod p) has a solution.
Assuming that claim, let x = a0 be a solution. Now consider y =
a20 + 1. Then
y 2 = (a20 + 1)2
= a40 + 2a20 + 1
≡ 2a20 (mod p),

i i

i i
i i

i i

234 B. Congruences

so 2a20 is a quadratic residue (mod p). Now a20 is certainly a quadratic


residue (mod p), so, by Lemma B.27, 2 is a quadratic residue (mod p)
as well. (Letting y = a20 − 1, we similarly see that −2 is a quadratic
residue (mod p).)
Now we must prove the claim. Let p = 8k + 1. Then by elementary
algebra x8 − 1 divides x8k − 1 = xp−1 − 1, and x4 + 1 divides x8 − 1,
so x4 + 1 divides xp−1 − 1. Write xp−1 − 1 = (x4 + 1)f (x). Note that
f (x) is a polynomial of degree p − 1 − 4 = p − 5.
Now for any a ≡ 0 (mod p), ap−1 ≡ 1 (mod p) by Fermat’s Little
Theorem, so
0 ≡ ap−1 ≡ (a4 + 1)f (a) (mod p),

i.e., p divides (a4 + 1)f (a). Since p is a prime, p divides a4 + 1 or p


divides f (a).

We now proceed by contradiction. Suppose x4 + 1 ≡ 0 (mod p) does not


have a solution. Then p never divides a4 + 1, so for every value of a with
1 ≤ a ≤ p − 1, p divides f (a). But there are p − 1 values of a, and f (x) is a
polynomial of degree p − 5, and this is impossible by Lemma B.39 below.

Corollary B.37. Let p be an odd prime.

(1) If p ≡ 1 (mod 8), then 2 and −2 are both quadratic residues (mod p).

(2) If p ≡ 3 (mod 8), then 2 is a quadratic residue (mod p) and −2 is a


quadratic nonresidue (mod p).

(3) If p ≡ 5 (mod 8), then 2 and −2 are both quadratic nonresidues (mod p).

(4) If p ≡ 7 (mod 8), then 2 is a quadratic nonresidue (mod p) and −2 is


a quadratic residue (mod p).

Proof: This follows directly from Lemma B.36, Corollary B.33, Corol-
lary B.30, and Lemma B.27. 

Up to now we have considered quadratic residues and nonresidues (mod p)


when p is a prime. But for our applications in Chapter 2, we also have to
consider the case when the modulus is composite. The basic definition is
the same: Fix an integer n. Let a be an integer relatively prime to n. Then
a is a quadratic residue (mod n) if the congruence x2 ≡ a (mod n) has a
solution and a is a quadratic nonresidue (mod n) if it does not.

i i

i i
i i

i i

B.3. Quadratic Congruences 235

Corollary B.38. Let n be divisible by a prime p1 ≡ 3 (mod 8) and by a prime


p2 ≡ 7 (mod 8). Then 2 and −2 are both quadratic nonresidues (mod n).

Proof: On the one hand, suppose 2 is a quadratic residue (mod n). Then
x2 ≡ 2 (mod n) for some x. But then x2 ≡ 2 (mod p2 ), i.e., 2 is a quadratic
residue (mod p2 ), which is impossible by Corollary B.37(4).
On the other hand, suppose −2 is a quadratic residue (mod n). Then
x2 ≡ −2 (mod n) for some x. But then x2 ≡ −2 (mod p1 ), i.e., 2 is a
quadratic residue (mod p1 ), which is impossible by Corollary B.37(2). 

We now prove the deferred result that we need.

Lemma B.39 (Lagrange). Let f (x) = xn + cn−1 xn−1 + . . . + c1 x + c0 be a


polynomial with integer coefficients and let p be a prime. Then there are at
most n values of a with 0 ≤ a ≤ p − 1 such that f (a) ≡ 0 (mod p).

Proof: We proceed by induction on n. In case n = 1, f (x) = x + c0 and


f (x) ≡ 0 (mod p) only for the single value of a with a ≡ −c0 (mod p),
0 ≤ a ≤ p − 1.
Now assume the lemma is true for every such polynomial of degree n−1
and let f (x) have degree n. If f (x) ≡ 0 (mod p) has no solutions, we are
done. So suppose f (a1 ) ≡ 0 (mod p). By the usual division algorithm
for polynomials, we can write f (x) = (x − a1 )g(x) + f (a1 ) with g(x) a
polynomial of degree n − 1. Then f (a1 ) ≡ 0 (mod p).
Now g(x) has degree n − 1, so by the inductive hypothesis there are at
most n − 1 values of x with 0 ≤ x ≤ p − 1 and g(x) ≡ 0 (mod p); call these
a2 , . . ., ak with k ≤ n. We need to show that if x = b, 0 ≤ b ≤ p − 1, with
f (x) ≡ 0 (mod p), then b = a1 , a2 , . . ., or ak . So assume f (b) ≡ 0 (mod p).
Then

0 ≡ f (b) = (b − a1 )g(b) + f (a1 ) ≡ (b − a1 )g(b) (mod p),

so p divides (b − a1 )g(b). But p is a prime, so p divides one of the factors.


Thus p divides b − a1 , in which case b = a1 as each is between 0 and p − 1,
or p divides g(b), in which case b = a2 , a3 , . . ., or ak , and we are done. 

Our penultimate result is a famous theorem, the Law of Quadratic Reci-


procity, first proved by Gauss. We shall prove this theorem in the next
section.

Theorem B.40 (Law of Quadratic Reciprocity (Gauss)). Let p and q be dis-


tinct odd primes.

i i

i i
i i

i i

236 B. Congruences

(1) If at least one of p and q is congruent to 1 (mod 4), then one of the
following is true:

(a) p is a quadratic residue (mod q) and q is a quadratic residue (mod p);


or

(b) p is a quadratic nonresidue (mod q) and q is a quadratic nonresidue


(mod p).

(2) If both p and q are congruent to 3 (mod 4), then one of the following
is true:

(a) p is a quadratic residue (mod q) and q is a quadratic nonresidue (mod p);


or

(b) p is a quadratic nonresidue (mod q) and q is a quadratic residue (mod p).

We need the following corollary of this result.

Corollary B.41. Let p be a prime congruent to 3 (mod 4) and let q = p be


an odd prime. Then q is a quadratic residue (mod p) if and only if −p is a
quadratic residue (mod q).

Proof: First suppose q ≡ 1 (mod 4). Then by Theorem B.40 (The Law of
Quadratic Reciprocity), q is a quadratic residue (mod p) if and only if p is a
quadratic residue (mod q). But, since q ≡ 1 (mod 4), p is a quadratic residue
(mod q) if and only if −p is a quadratic residue (mod q), by Corollary B.34,
yielding the result.
Next suppose q ≡ 3 (mod 4). Then by Theorem B.40 (The Law of
Quadratic Reciprocity), q is a quadratic residue (mod p) if and only if p is
a quadratic nonresidue (mod q). But, since q ≡ 3 (mod 4), p is a quadratic
nonresidue (mod q) if and only if −p is a quadratic residue (mod q), by
Corollary B.34, again yielding the result. 

B.4 Proof of the Law of Quadratic Reciprocity


In this section we investigate quadratic residues and nonresidues, leading
to the proof of the Law of Quadratic Reciprocity (Theorem B.40). It is
convenient to introduce the following standard notation.

i i

i i
i i

i i

B.4. Proof of the Law of Quadratic Reciprocity 237

Definition B.42. Let p be a prime and let a be relatively prime to p. The


Legendre symbol , or quadratic residue symbol (a/p), is defined by

(a/p) = 1 if a is a quadratic residue (mod p),


(a/p) = −1 if a is a quadratic nonresidue (mod p).

Lemma B.43. Let p be a prime and let a and b be relatively prime to p.


Then (ab/p) = (a/p)(b/p).

Proof: This is merely a restatement of Lemma B.27. 

We have the following criterion, due to Euler, for a to be a quadratic residue


(mod p).

Proposition B.44 (Euler). Let p be an odd prime and let a be relatively prime
to p. Then a(p−1)/2 ≡ ±1 (mod p). In fact, a(p−1)/2 ≡ (a/p) (mod p), i.e.,

a(p−1)/2 ≡ 1 (mod p) if a is a quadratic residue (mod p),


a (p−1)/2
≡ −1 (mod p) if a is a quadratic nonresidue (mod p).

Proof: Let b = a(p−1)/2 . Then b2 = ap−1 ≡ 1 (mod p) by Theorem B.29.


Thus b is a solution of the congruence x2 − 1 ≡ 0 (mod p). Clearly this
congruence has ±1 as solutions. By Lemma B.39, it cannot have any other
solutions, so we must have b ≡ ±1 (mod p).
Suppose a is a quadratic residue (mod p). Then a ≡ c2 (mod p) for
some c, and so a(p−1)/2 ≡ (c2 )(p−1)/2 = cp−1 ≡ 1 (mod p). This now shows
that the congruence x(p−1)/2 − 1 ≡ 0 (mod p) has every quadratic residue
as a solution, and there are (p − 1)/2 of these, so there cannot be any
other solutions. Thus if a is a quadratic nonresidue (mod p), a(p−1)/2 ≡
−1 (mod p). 

Corollary B.45. Let p be an odd prime. Then (−1/p) = 1 if p ≡ 1 (mod 4)


and (−1/p) = −1 if p ≡ 3 (mod 4).

Proof: If p ≡ 1 (mod 4) then (p − 1)/2 is even, so (−1)(p−1)/2 = 1. If


p ≡ 3 (mod 4) then (p − 1)/2 is odd, so (−1)(p−1)/2 = −1. Now apply
Proposition B.44. 

Although the Law of Quadratic Reciprocity had been conjectured ear-


lier, the first person to prove it was Gauss. This was one of Gauss’s great
achievements, and he returned to this theorem repeatedly, producing sev-
eral proofs. The proof we present here is a variant, due to Eisenstein,

i i

i i
i i

i i

238 B. Congruences

of Gauss’s third proof. We begin with a lemma due to Gauss, but it is


convenient to establish (some nonstandard) notation first. For a prime p
and an integer relatively prime to p, we let ˜ be the remainder when
is divided by p. That is, ˜ is the unique integer between 1 and p − 1 with
˜ ≡ (mod p).

Lemma B.46 (Gauss’s Lemma). Let p be an odd prime and let a be relatively
prime to p. Let S = {a, 2a, . . . , ((p − 1)/2)a}. Let
n = #({ in S | ˜ > p/2}).
Then (a/p) = (−1)n .
Proof: Observe that all of the elements of S are relatively prime to p and
that no two of them are congruent (mod p). Let m = #({ in S | ˜ < p/2}).
Since, for any relatively prime to p, either ˜ < p/2 or ˜ > p/2, we have
m + n = p.
Denote by q1 , . . . , qm those remainders less than p/2 that appear when
elements of S are divided by p, and by r1 , . . . , rn those remainders greater
than p/2 that appear when elements of T are divided by p. Clearly the
elements of T = {q1 , . . . , qm , r1 , . . . , rn } are all distinct. We claim that in
fact the elements of U = {q1 , . . . , qm , p − r1 , . . . , p − rn } are all distinct.
To see this, suppose that qi = p − rj for some i and j. By definition,
qi ≡ va (mod p) and rj ≡ wa (mod p) for some integers v and w between 1
and (p − 1)/2. Then
0 ≡ p = qi + rj ≡ va + wa = (v + w)a (mod p),
which is impossible as 2 ≤ v + w ≤ p − 1, and so, in particular, v + w ≡
0 (mod p).
Observe that U is a set of (p − 1)/2 distinct integers, all between 1
and (p − 1)/2, so in fact we must have U = {1, . . . , (p − 1)/2} (where the
elements of U appear in some unpredictable but irrelevant order).
Let ΠS be the product of the elements of S, ΠT the product of the
elements of T , and ΠU the product of the elements of U . Let us calculate
these numbers (mod p). First, from the definition of S, we see that
ΠS = a · 2a · · · ((p − 1)/2)a = a · · · a · 1 · ((p − 1)/2)
= a(p−1)/2 ((p − 1)/2)!.
Next, since T simply consists of the remainders when the elements of S are
divided by p, we certainly have
ΠT = q1 · · · qm r1 · · · rn ≡ ΠS (mod p).

i i

i i
i i

i i

B.4. Proof of the Law of Quadratic Reciprocity 239

Finally, the punch line: We calculate ΠU two ways. On the one hand, from
the definition of U ,

ΠU = q1 · · · qm (p − r1 ) · · · (p − rn ) ≡ q1 · · · qm (−r1 ) · · · (−rn )
= (−1)n q1 · · · qm r1 · · · rn ≡ (−1)n ΠT (mod p).

On the other hand, we have observed that U = {1, . . . , (p − 1)/2}, so

ΠU = 1 · 2 · · · ((p − 1)/2) = ((p − 1)/2)!.

Combining these calculations, we see that

((p − 1)/2)! = ΠU ≡ (−1)n ΠT ≡ (−1)n ΠS


≡ (−1)n a(p−1)/2 ((p − 1)/2)! (mod p).

Euler’s Criterion (Proposition B.44) tells us that a(p−1)/2 ≡ (a/p) (mod p).
Using that we then obtain

1 ≡ (−1)n a(p−1)/2 ≡ (−1)n (a/p) (mod p),

and so (a/p) = (−1)n , as claimed. 

In the following lemma, [·] denotes the greatest integer function, as usual.

Lemma B.47. Let p be an odd prime and let a be an odd integer. Let n be
as in the statement of Gauss’s Lemma. Then


(p−1)/2
n ≡ n = [ka/p] (mod 2)
k=1


and thus (a/p) = (−1)n .

Proof: We keep the notation of the proof of Gauss’s Lemma. By definition,


for each value of k, with k between 1 and (p − 1)/2, ka = [ka/p]p + ka.
Since a and p are both odd, this gives ka ≡ [ka/p] + k (mod 2). Thus


(p−1)/2

(p−1)/2

(p−1)/2
ka ≡ [ka/p] + k (mod 2).
k=1 k=1 k=1

i i

i i
i i

i i

240 B. Congruences

Now each ka is either a qi or an rj , so


(p−1)/2

m 
n
[ka/p]ka = qi + sj
k=1 i=1 j=1
m n
≡ qi − sj (mod 2)
i=1 j=1

m 
n
= −np + qi + (p − sj )
i=1 j=1

m 
n
≡n+ qi + (p − sj ) (mod 2).
i=1 j=1

But, as we observed in the proof of Gauss’s Lemma, {q1 , . . . , qm , p−s1 , . . . , p−


sn } = {1, . . . , (p − 1)/2}. Thus


(p−1)/2

(p−1)/2
ka ≡ n + k (mod 2),
k=1 k=1

(p−1)/2
and comparing the two expressions for k=1 ka yields the result. 

We now assemble these results to prove the Law of Quadratic Reciprocity,


which we restate using the Legendre symbol.

Theorem B.48 (Law of Quadratic Reciprocity (Gauss)). Let p and q be odd


primes. Then
p−1 q−1
(p/q)(q/p) = (−1) 2 · 2 .

Proof: Let R be the rectangle in the xy-plane whose vertices are (0, 0),
(p/2, 0), (0, q/2), and (p/2, q/2). Let D be the diagonal of R running from
(0, 0) to (p/2, q/2). D divides R into two triangles. Let T + be the triangle
lying above D and T − be the triangle lying below D. Consider the lattice
points (i.e., points with integer coefficients) strictly inside R (i.e., in R but
not on the boundary of R). Note that there are r = ((p − 1)/2)((q − 1)/2)
such lattice points. Let n+ be the number of these lattice points that are in
T + and let n− be the number of these lattice points that are in T − . Note
that the line D has the equation y = (q/p)x, and so none of these lattice
points lie on D. Hence we see that r = n+ + n− , and so
p−1 q−1 +
2 · 2 +n− + −
(−1) = (−1)r = (−1)n = (−1)n (−1)n .

i i

i i
i i

i i

B.5. Primitive Roots 241

We wish to count n− and n+ . Starting on the x-axis and moving vertically


up until we hit D, we see that


(p−1)/2

n = n−
k
k=1

where n−
k = #({lattice points in T

with x − coordinate k}), i.e.,

n−
k = #({(k, y) | y is an integer with 0 < y < (q/p)k}) = [kq/p],

(p−1)/2
so n− = k=1 [kq/p]. Similarly, starting on the y-axis and moving hor-
(p−1)/2
izontally to the right until we hit D we obtain that n+ = k=1 [kp/q].
− +
But by Lemma B.47, (q/p) = (−1)n and (p/q) = (−1)n , and we are
done. 

Theorem B.49. Let p be an odd prime. Then

(1) (−1/p) = (−1)(p−1)/2 , and


2
−1)/8
(2) (2/p) = (−1)(p .

Proof: This is just a restatement of Corollary B.30, Lemma B.33, and


Lemma B.36. 

B.5 Primitive Roots


In this section we introduce, and investigate, the notion of a primitive root.
The theory of primitive roots is an important one, but for the most part is
tangential to our purposes. However, it provides a very illuminating view-
point, and an alternate proof of some of our previous results, on quadratic
residues, and we carry out our investigation far enough to arrive at these.
We begin with a useful observation. Let n be any integer and suppose
that a is relatively prime to n. Then, by Theorem B.11, there is an integer
a with aa ≡ 1 (mod n), and a is unique (mod n). In this situation we
write a−1 ≡ a (mod n). Observe that with this definition of a−1 (mod n),
all the usual laws of exponents hold (mod n).

Definition B.50. Let p be a prime and let a be relatively prime to p. The


order ordp a of a (mod p) is the smallest positive integer k such that ak ≡
1 (mod p).

i i

i i
i i

i i

242 B. Congruences

Lemma B.51. Let p be a prime and let a be relatively prime to p. Let be


any integer. Then a ≡ 1 (mod p) if and only if is divisible by ordp a.

Proof: For convenience, let k = ordp a.


First, suppose that is divisible by k. Then a = (ak )/k ≡ 1/k =
1 (mod p). Conversely, suppose that a ≡ 1 (mod p). By the division al-
gorithm, we may write = kq + r where r = 0 or r is an integer with
1 ≤ r < k. Then 1 ≡ a = akq+r = akq ar = (ak )q ar ≡ 1q ar = ar (mod p).
But by the definition of ordp a, ar ≡ 1 (mod p) for 1 ≤ r < ordp a, so we
must have r = 0. 

Corollary B.52. Let p be a prime and let a be relatively prime to p. Then


ordp a divides p − 1.

Proof: By Fermat’s Little Theorem (Theorem B.29), ap−1 ≡ 1 (mod p), so


this follows immediately from Lemma B.51. 

Lemma B.53. Let p be a prime and let a be relatively prime to p. The


following are equivalent:

(1) ordp a = p − 1.

(2) {1 = a0 , a, . . . , ap−2 } are all distinct (mod p).

Proof: We will prove that (1) is false if and only if (2) is false.
Suppose that (1) is false. Then, by Corollary B.52, ak ≡ 1 (mod p)
for some k with 1 ≤ k < p − 1. But then a0 and ak are not distinct
(mod p), and (2) is false. On the other hand, suppose that (2) is false.
Then ai ≡ aj (mod p) with i = j and i and j both between 0 and p − 2.
We may assume that i < j. Set k = j − i. Then ak ≡ 1 (mod p) with
1 ≤ k < p − 1, and (1) is false. 

Definition B.54. Let p be a prime. Then r is a primitive root (mod p) if


ordp r = p − 1.

Our goal is to show that every prime has a primitive root. We build up
to this by proving two results that are useful in themselves.

Lemma B.55. Let p be a prime and let {a1 , . . . , an } be relatively prime to


p, and let a ≡ a1 · · · an (mod p). Let ki = ordp ai for each i, and let
k = k1 · · · kn . Suppose that {k1 , . . . , kn } are pairwise relatively prime. Then
ordp a = k.

i i

i i
i i

i i

B.5. Primitive Roots 243

Proof: We prove this by induction on n. The case n = 1 is trivial. The


case n = 2 is the crucial case. Let j = ordp a1 a2 . Certainly

(a1 a2 )k1 k2 = (ak11 )k2 (ak22 )k1 ≡ 1k2 1k1 ≡ 1 (mod p),

so j divides k1 k2 .
On the other hand, aj1 aj2 = (a1 a2 )j ≡ 1 (mod p) gives aj1 ≡ a−j
2 ≡
(aj2 )−1 (mod p). Then

1 = 1j ≡ (ak11 )j = (aj1 )k1 ≡ (a−j


2 )
k1
≡ (ajk1 −1
2 ) (mod p),

so k2 divides jk1 . Since we are assuming that k1 and k2 are relatively


prime, this implies that k2 divides j. Similarly, we see that k1 divides jk2 ,
and hence that k1 divides j. But again, since k1 and k2 are relatively prime,
we obtain that k1 k2 divides j. Thus j = k1 k2 , i.e., ordp a1 a2 = k1 k2 .
Now for the general inductive step. Assume that the theorem is true for
some value of n ≥ 2 and consider {a1 , . . . , an+1 }. Set a = a1 · · · an . By the
inductive hypothesis, ordp a = k  , where k  = k1 · · · kn . But then, by the
n = 2 case, a = a an+1 = a1 · · · an+1 has order k = k  kn+1 = k1 · · · kn+1 ,
as claimed. 

Proposition B.56. Let p be a prime and let b1 , . . . , bn be relatively prime to


p. Let ki = ordp bi for each i, and let k = lcm(k1 , . . . , kn ). Then there is
an a with ordp a = k.

Proof: Factor k as k = pe11 · · · pemm with each pi a prime. Then, for each i,
there is an element bi with ki divisible by pei i , say ki = pei i qi for some qi .
Set ai = bqi i . Then ordp ai = pei i . Let a = a1 · · · am . Then, by Lemma B.55,
ordp a = k. 

Theorem B.57. Let p be a prime. Then p has a primitive root.

Proof: Consider {1, 2, . . . , p − 1}. Let ordp i = ki for each i. Let k =


lcm(k1 , . . . , kp−1 ). Then ik ≡ 1 (mod p) for each i. Suppose that k < p − 1.
Then the polynomial congruence xk −1 ≡ 0 (mod p) has p−1 > k solutions,
contradicting Lemma B.39. Hence we must have k = p − 1. But then, by
Proposition B.56, there is an a with ordp a = p − 1, and so, by definition,
a is a primitive root (mod p). 

Lemma B.58. Let p be a prime and let r be a primitive root (mod p). Let
a be relatively prime to p. Then a ≡ rk (mod p) for some k, and k is well
defined (mod(p − 1)).

i i

i i
i i

i i

244 B. Congruences

Proof: The p − 1 powers of r, {1 = r0 , r, . . . , rp−2 } are all mutually incon-


gruent (mod p). But there are only p − 1 congruence classes of elements
a relatively prime to p (given by {1, 2, . . . , p − 1}), so any such a must be
congruent to pk (mod p) for some k.
If k  ≡ k (mod(p − 1)) then k  = k + m(p − 1) for some m, and then

rk = rk+m(p−1) = rk (rp−1 )m ≡ rk (1)m = rk (mod p). On the other hand,
 
if rk ≡ rk (mod p), then rk −k ≡ 1 (mod p) so k  − k is divisible by p − 1,
i.e., k  ≡ k (mod(p − 1)). 

Corollary B.59. Let p be an odd prime and let a be relatively prime to p.


The a is a quadratic residue (mod p) if and only if for some, and hence for
any primitive root r (mod p), a ≡ rk (mod p) with k even.

Proof: First let us note that the statement of the corollary makes sense, as
if p is an odd prime then p − 1 is even. By Lemma B.58, k is well defined
(mod(p − 1)) and so is certainly well defined (mod 2).
Suppose that a ≡ rk (mod p) with k even. Let k = 2j. Then a ≡
rk = r2j = (rj )2 (mod p) and a is a quadratic residue (mod p). Conversely,
suppose that a is a quadratic residue (mod p), so a ≡ b2 (mod p) for some
b. Then b ≡ rj (mod p) for some j, so a ≡ b2 ≡ (rj )2 = r2j = rk (mod p)
with k = 2j even. 

We now collect a number of the results we have earlier proved about


quadratic residues together, and prove them using primitive roots. Note
that the only prior results we have used in proving Theorem B.57 are Fer-
mat’s Little Theorem (Theorem B.29) and Lagrange’s result (Lemma B.39),
so our development here is independent of the other results in Section B.3.

Corollary B.60. Let p be an odd prime.

(1) Let r be a primitive root (mod p). For any a relatively prime to p, let
a ≡ rk (mod p). Then (a/p) = (−1)k .

(2) There are (p − 1)/2 quadratic residues (mod p) and (p − 1)/2 quadratic
nonresidues (mod p).

(3) For any a relatively prime to p, a(p−1)/2 ≡ (a/p) (mod p).

(4) (−1/p) = (−1)(p−1)/2 = 1 if p ≡ 1 (mod 4) and = −1 if p ≡ 3 (mod 4).

(5) For any a and b relatively prime to p, (ab/p) = (a/p)(b/p).

i i

i i
i i

i i

B.6. Exercises 245

Proof:

(1) This is just a restatement of Corollary B.59.

(2) By Corollary B.59, {1 = r0 , r2 , r4 , . . . , rp−3 } are quadratic residues


(mod p) and {r1 , r3 , r5 , . . . , rp−2 } are quadratic nonresidues (mod p).

(3) Let b ≡ a(p−1)/2 (mod p). Then b2 ≡ ap−1 ≡ 1 (mod p). The congru-
ence 0 ≡ x2 − 1 = (x − 1)(x + 1) (mod p) has only the two solutions
x ≡ ±1 (mod p), so we see b ≡ ±1 (mod p). Now a = rk for some k and
then b ≡ rk(p−1)/2 (mod p). If a is a quadratic residue (mod p), then k
is even and k(p − 1)/2 is divisible by p − 1 and so b ≡ 1 (mod p). If a
is a quadratic nonresidue (mod p), then k is odd and k(p − 1)/2 is not
divisible by p − 1 and so b ≡ 1 (mod p), in which case we must have
b ≡ −1 (mod p).

(4) Let a = −1 in part (2). Note that (p − 1)/2 is even if p ≡ 1 (mod 4)


and that (p − 1)/2 is odd if p ≡ 3 (mod 4). Note furthermore that, for
p ≡ 1 (mod 4), −1 ≡ (r(p−1)/4 )2 (mod 4).

(5) Let a ≡ rk (mod p) and b ≡ r (mod p). Then ab ≡ rk+ (mod p). By
part (1), (a/p) = (−1)k (mod p), (b/p) = (−1) (mod p), and (ab/p) =
(−1)k+ (mod p). But (−1)k+ = (−1)k (−1) . 

B.6 Exercises
Exercise B.1. Fix a positive integer n. Let k be any integer and consider
the set S = {k, k + 1, k + 2, . . . , k + n − 1}. Show that, for any integer x, the
congruence x ≡ a (mod n) is valid for a exactly one of the integers in S.
(Compare Corollary B.6.) A set S with this property is called a complete
system of residues (mod n).

Exercise B.2. Fix a positive integer n. Let k be any integer relatively prime
to n and consider the set S = {0, k, 2k, . . . , (n − 1)k}. Show that S is a
complete system of residues (mod n).

Exercise B.3. Let n be any positive integer and let S be any complete sys-
tem of residues (mod n). Show that S has n elements.

Exercise B.4. Solve each of the following congruences:

(a) 2x ≡ 11 (mod 17);

i i

i i
i i

i i

246 B. Congruences

(b) 7x ≡ −5 (mod 9);

(c) 19x ≡ 8 (mod 14);

(d) 5x + 3 ≡ 7 (mod 9);

(e) 13x + 4 ≡ 11 (mod 10);

(f) 12x + 8 ≡ 5 (mod 17).

Exercise B.5. Solve each of the following congruences, if possible:

(a) 3x ≡ 6 (mod 9);

(b) 12x ≡ 4 (mod 20);

(c) 5x ≡ 8 (mod 10);

(d) 4x ≡ 12 (mod 20);

(e) 10x ≡ 10 (mod 35);

(f) 6x ≡ 11 (mod 15).

Exercise B.6. Use Euclid’s Algorithm (from Chapter 2) to solve the follow-
ing congruences:

(a) 97x ≡ 125 (mod 127);

(b) 323x ≡ 725 (mod 1001);

(c) 12345x ≡ 54321 (mod 41981);

(d) 13579x ≡ 24680 (mod 99991).

Exercise B.7. Solve the following systems of simultaneous congruences:

(a) x ≡ 4 (mod 7), x ≡ 4 (mod 9);

(b) x ≡ 3 (mod 5), x ≡ 6 (mod 8);

(c) x ≡ −4 (mod 11), x ≡ 7 (mod 8);

(d) x ≡ 6 (mod 11), x ≡ 0 (mod 13).

i i

i i
i i

i i

B.6. Exercises 247

Exercise B.8. Solve the following systems of simultaneous congruences:

(a) x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 4 (mod 7);

(b) x ≡ 3 (mod 4), x ≡ 4 (mod 7), x ≡ 9 (mod 11).

Exercise B.9. Fix a nonzero integer n. Suppose that b is relatively prime


to n. Then, by Theorem B.11, the congruence bx ≡ a (mod n) holds for a
unique value of x (mod n). In this case let us write x ≡ a/b. Show that
with this definition, the “usual” rules of fractions hold:

(a) b(a/b) ≡ a;

(b) (a/b)(b/a) ≡ 1;

(c) a(b/c) ≡ (ab)/c;

(d) (ab)/(ac) ≡ b/c;

(e) (a/b)(c/d) ≡ (ac)/(bd);

(f) (a/c) + (b/c) ≡ (a + b)/c;

(g) (a/b) + (c/d) ≡ (ad + bc)/(bd);

(h) (a/b) ≡ (c/d) ⇔ ad ≡ bc.

(In all cases we assume that the denominators are relatively prime to n.)

Exercise B.10. Fix a prime p.

(a) Show that for any c, the congruence

x2 + y 2 ≡ c (mod p)

has a solution.

(b) More generally, suppose that a ≡ 0 (mod p) and b ≡ 0 (mod p). Show
that for any c, the congruence

ax2 + by 2 ≡ c (mod p)

has a solution.

Exercise B.11. Use the properties of the Legendre symbol to find the value
of each of the following Legendre symbols with a minimum of hand
computation:

i i

i i
i i

i i

248 B. Congruences

(a) (9767/9931);

(b) (9803/9967);

(c) (−210/991);

(d) (−210/983);

(e) (2747/2897);

(f) (2747/2837).

(All of the odd integers above are prime, except for 2747, which is
composite.)

Exercise B.12.

(a) Let p be an odd prime and let a be an integer relatively prime to p.


Let n ≥ 1 be an arbitrary integer. Show that the congruence x2 ≡
a (mod pn ) has a solution if and only if the congruence x2 ≡ a (mod p)
has a solution, i.e., if and only if a is a quadratic residue (mod p).

(b) Let p = 2 and let a be an odd integer. Observe that the congruence
x2 ≡ a (mod 2) always has a solution, and that the congruence x2 ≡
a (mod 4) has a solution if and only if a ≡ 1 (mod 4). Let n ≥ 3 be
an arbitrary integer. Show that the congruence x2 ≡ a (mod 2n ) has a
solution if and only if a ≡ 1 (mod 8).

Exercise B.13. Let m and n be relatively prime. Show that the congru-
ence x2 ≡ a (mod mn) has a solution if and only if the congruences
x2 ≡ a (mod m) and x2 ≡ a (mod n) have solutions.

Exercise B.14. Let b = 2e0 pe11 · · · pekk with {pi } distinct odd primes, e0 ≥ 0
and ei > 0 for i > 0. Let a be relatively prime to p. Show that the
congruence x2 ≡ a (mod b) has a solution if and only if a is a quadratic
residue (mod pi ) for each i and

(a) if e0 = 0 or 1, no further condition,

(b) if e0 = 2, a ≡ 1 (mod 4), and

(c) if e0 ≥ 3, a ≡ 1 (mod 8).

i i

i i
i i

i i

B.6. Exercises 249

Exercise B.15. Use Gauss’s Lemma (Lemma B.46) to prove Theorem B.49,
i.e., that for an odd prime p

(a) (−1/p) = (−1)(p−1)/2 , and


2
−1)/8
(b) (2/p) = (−1)(p .

Exercise B.16.

(a) Use Gauss’s Lemma (Lemma B.46) directly to prove the Law of Quad-
ratic Reciprocity for p = 3. (Hint: consider q (mod 12). There are four
cases.)

(b) Use Gauss’s Lemma (Lemma B.46) directly to prove the Law of Quad-
ratic Reciprocity for p = 5. (Hint: consider q (mod 20). There are
eight cases.)

Exercise B.17. Let b > 1 be an odd integer and let a be relatively prime to
b. Let b = pe11 · · · pekk be the prime factorization of b. The Jacobi symbol
(a/b) is defined by (a/b) = (a/p1 )e1 · · · (a/pk )ek where the symbols on the
right are Legendre symbols.

(a) Show that if (a/b) = −1, then the congruence x2 ≡ a (mod b) does not
have a solution.

(b) Show that if (a/b) = 1, then the congruence x2 ≡ a (mod b) may or


may not have a solution. (That is, give an example of each possibility.)

(c) Show that (a1 a2 /b) = (a1 /b)(a2 /b).

(d) Show that (a/b1 b2 ) = (a/b1 )(a/b2 ).

(e) Show that (−1/b) = (−1)(b−1)/2 .


2
−1)/8
(f) Show that (2/b) = (−1)(b .

(g) Suppose that a > 1 and b > 1 are relatively prime odd integers. Show
that (a/b)(b/a) = (−1)((a−1)/2)((b−1)/2) .

Exercise B.18.

(a) For each prime number p between 2 and 19, find all primitive roots
(mod p).

(b) For each prime number p between 23 and 47, find at least one primitive
root (mod p).

i i

i i
i i

i i

250 B. Congruences

Exercise B.19. Let p be a prime and let r be a primitive root (mod p). Let
d be a positive integer.

(a) Suppose that d is relatively prime to p − 1. Show that ad ≡ 1 (mod p)


if and only if a ≡ 1 (mod p).

(b) Show that rd is a primitive root (mod p) if and only if d is relatively


prime to p − 1.

(c) Suppose that d divides p − 1. Show that ad ≡ 1 (mod p) has exactly d


solutions, and that these are given by a ≡ rk(p−1)/d (mod p) for k = 0,
1, . . . , d − 1.

(d) In general, let g = gcd(d, p − 1). Show that ad ≡ 1 (mod p) has exactly
g solutions, and that these are given by a ≡ rk(p−1)/g (mod p) for k = 0,
1, . . . , g − 1.

i i

i i
i i

i i

Appendix C

Continuations from Chapter 2

In this appendix we continue some of the items from Chapter 2. In each


case, we simply pick up where we left off, with no further ado.

C.1 Continuation of the Proof of Theorem 2.8


We now deal with the cases D = −11, −7, 5, 6, 7, 11, 13, 17, 21, and 29.
First, we need to see that in these cases, our previous proof does not

work. For example, let us consider D = −7 and let us take γ0 = (2/5) −7
corresponding to the point (0, 2/5) of the plane. Now this point is appar-
ently nearest the origin, but (0, 2/5)−7 = | 02 + 7(2/5)2| = 1.12 > 1, so
its actual distance from the origin is greater than√1. As a second example,
let us consider D = 6 and let us take γ0 = (4/9) 6, corresponding to the
point (0, 4/9) of the plane. Now this point is apparently nearest the origin
but (0, 4/9)6 = 1.185 . . . > 1, so its actual distance from the origin is
greater than 1.
But we can still find an appropriate point γ in each of these cases! This
time γ will not be the apparently nearest point. Instead, it will be appar-

ently further away. For example, when D = −7 and γ0 = (2/5) −7,

we may take γ = (1 + −7)/2, corresponding to the point (1/2, 1/2)
of the plane, and then γ0 − γ−7 = (−1/2, −1/10)−7 = √|(−1/2)2 +
7(−1/10)2| = 0.32 < 1, and when D = 6 and γ0 = (4/9) 6 we may
take γ = 1, corresponding to the point (1, 0), and then γ0 − γ6 =
(−1, 4/9)6 = |(−1)2 − 6(4/9)2 | = 0.185 . . . < 1. (In fact, we may take a
point apparently even further away √ from γ0 that is actually closer to γ0 .
For example, if we take γ = 6 − 2 6, corresponding to the point (6, −2),
we see that γ0 − γ6 = |(−6)2 − 6(−22/9)2| = 0.148 . . . . But all we need

251

i i

i i
i i

i i

252 C. Continuations from Chapter 2

0.8

0.6

0.4

0.2

±1 ±0.5 0.5 1 1.5


±0.2

±0.4

Figure C.1. The case D = −7.

is a point γ with γ0 − γ6 < 1, so our first choice γ = 1 will do, and we
do not have to go out apparently this far.)
We have previously noted that we may restrict our attention to 0 ,
the region consisting of points that are apparently closest to the origin. In
fact, we will begin by considering + 0 , the portion of this region in the first
quadrant. (In case D ≡ 2 or 3 (mod 4), this is the square with vertices
(0, 0), (1/2, 0), (1/2, 1/2), and (0, 1/2), and in case D ≡ 1 (mod 4), this is
the right triangle with vertices (0, 0), (1/2, 0), and (0, 1/2).)
Again we begin with D < 0, where we can consider ellipses.
First consider D = −7. Then the points (x, y) with (x, y)−(0, 0)−7 <
1 are the interior of an ellipse centered at (0, 0), and the points (x, y)
with (x, y) − (1/2, 1/2)−7 < 1 are the interior of an ellipse centered at
(1/2, 1/2), and these two ellipses completely cover + 0 . See Figure C.1.
(You should check this. Verify  that the ellipse centered at (0, 0) crosses
the y-axis at the point (0, 1/7) and the line x + y = 1/2 at the point
(1/8, 3/8), and the ellipse
 centered at (1/2, 1/2) crosses the y-axis at the
point (0, 1/2 − (1/2) 3/7) and the line x + y = 1/2 at the point (3/8, 1/8),
so the “top” of the first ellipse lies above the “bottom” of the second ellipse
in + 0 .)

0.8

0.6
0.4

0.2

±1 ±0.5 0.5 1 1.5


±0.2

Figure C.2. The case D = −11.

i i

i i
i i

i i

C.1. Continuation of the Proof of Theorem 2.8 253

Thus all the points in + 0 are taken care of. Depending on γ0 in + 0,



we may choose γ = 0 or γ = 1/2 + (1/2) −7 with γ0 − γ−7 < 1, as
required. Then we may take care of all of the points in 0 by using the
symmetry of the situation. The points in the second quadrant are taken

care of by choosing γ = 0 or γ = (−1/2)+(1/2) −7; the points in the third

quadrant are taken care of by choosing γ = 0 or γ = (−1/2) + (−1/2) −7;
and the points in the fourth quadrant are taken care of by choosing γ =

(1/2) + (−1/2) −7.
The situation for D = −11 is very similar. Again the ellipses centered
at the points (0, 0) and (1/2, 1/2), corresponding to γ = 0 and γ = (1/2) +

(1/2) −11, cover + 0 . See Figure C.2. (You should again verify this,
but this time we will leave it to you to find the coordinates of the various
intersections.) And again, once we have covered + 0 , we may use the
symmetry of the situation to reflect the centers of the ellipses and cover
0 , as required.

Now we turn to D > 0. Our strategy is the same, but it is more


complicated to carry out, as we must consider hyperbolas rather than el-
lipses. The first case to consider is D = 5. Then the points (x, y) with
(x, y) − (0, 0)5 < 1 are the interior of a hyperbolic region centered at
(0, 0), and the points (x, y) with (x, y) − (1/2, 1/2)5 < 1 are the inte-
rior of a hyperbolic region centered at (1/2, 1/2), and these two regions
together cover + 0 . See Figure C.3. (Once again you should  verify this.
The top curve x2 − 5y 2 = −1 √ intersects
√ the y-axis at (0, 1/5) and the
line x + y = 1/2 at ((5 − 21)/8, ( 21 − 1)/8), and the bottom curve
(x − 1/2)2 − 5(y − 1/2)2 = −1 intersects the y-axis at the origin (0, 0) and

1.5

0.5

±2 ±1 0 1 2

±0.5

±1

Figure C.3. The case D = 5.

i i

i i
i i

i i

254 C. Continuations from Chapter 2

0.5

±2 ±1 1 2

±0.5

Figure C.4. The case D = 13.

√ √
the line x + y = 1/2 at (( 21 − 1)/8, (5 − 21)/8), so the picture is as
shown.) Again, by symmetry, once we have covered + 0 we can cover 0.
The situation for D = 13 is very similar. Again the hyperbolic regions
centered at (0, 0) and (1/2, 1/2) cover + 0 . See Figure C.4. (Again you
should verify this, figuring out the coordinates of the intersection points for
yourself.) Again, by symmetry, once we have covered + 0 we can cover 0.
Now we turn to D = 6. Here the interiors of the hyperbolic regions
centered at (0, 0) and (−1, 0) cover all of + 0 . (See Figure C.5.) There
is one subtlety here, however, that we need to remark on. The top curve
x2 − 6y 2 = −1
 and the right-hand curve (x + 1) − 6y
2 2
= 1 intersect at the
+
point (1/2, 5/24) on the right-hand border of 0 . But remember that
we are only concerned with Q-points (e, f ), i.e., with points (e, f ) with
both coordinates √ rational numbers
√ (as these are the points
 that correspond
to elements e + f D of Q( D)) and the y-coordinate 5/24 of this point
is not a rational number, so this point is not a Q-point and we have indeed
covered + 0 , and again by symmetry we cover 0.

0.5

±2 ±1 0 1 2

±0.5

±1

Figure C.5. The case D = 6.

i i

i i
i i

i i

C.2. Continuation of Example 2.26 255

0.5

±2 ±1 0 1 2

±0.5

±1

Figure C.6. The case D = 7.

The situation for D = 7 is very similar. See Figure C.6. Again, using
hyperbolas centered at (0, 0) and (−1, 0) we cover all of +
0 . We have
 the
same subtlety. The two hyperbolas intersect at the point (1/2, 5/28),
but this is not a Q-point. Again, by symmetry, once we have covered + 0
we can cover 0 , so we are done.
To recapitulate, in the cases D = −1, −2, −3, 2, and 3 we could find a
single value of γ so that the associated region covered +0 , and in the cases
D = −7, −11, 5, 13, 6, and 7 we could find two values of γ so that the two
associated regions together covered + 0 . We can handle other values of D
if we use more
√ values of γ for each D. For D = 17 we can choose γ √ = 0,
(1/2)+(1/2) 17, or −1. For D = 21 we can choose γ =√0, (1/2)+(1/2)√21,
or −1. For D = 29 we can choose γ = 0, (1/2) + (1/2) √ 29, −1, or 4 + √29.
pagebreak For D = 11 we can choose γ = 0, −1, 2 + 11, or −5 − 11.
We leave the details for the exercises.

C.2 Continuation of Example 2.26



(2) Now we do an example with R = O( −7). This example is in principle

the same as the previous example, where we had R = O( −1), but
actually it is more involved, as it uses, and illustrates, the subtlety in
the proof of Theorem 2.8 in the case D = −7. To simplify notation,
and to bring out the parallel with the previous example, we will set

j = −7.

i i

i i
i i

i i

256 C. Continuations from Chapter 2

Let α1 = 20 + 13j and α2 = 5 + j. Then

20 + 13j = (5 + j)((11 + 3j)/2) + 3


5+j = 3((3 + j)/2) + ((1 − j)/2)
3 = ((1 − j)/2)(1 + j) + (−1)
1+j = (−1)(−1 − j),
so the gcd is −1, and then
−1 = 3 + ((1 − j)/2)(−1 − j)
= 3 + ((5 + j) + 3((−3 − j)/2))(−1 − j)
= (5 + j)(−1 − j) + 3(−1 + 2j)
= (5 + j)(−1 − j) + ((20 + 13j) + (5 + j)((−11 − 3j)/2))(−1 + 2j)
= (20 + 13j)(−1 + 2j) + (5 + j)((51 − 21j)/2).
This certainly needs a lot of explanation. We use the language and the
notation of the proof of Theorem 2.8 here.
In the first step, (20+13j)/(5+j) = (191+45j)/32 = 5.96875+1.40625j.
This corresponds to the point (5.96875, 1.40625) in the plane, and this point
is apparently nearest the lattice point (6, 1) representing
 6 + j. In the usual
metric on the plane, its distance from this point is (−1/32) 2 + (13/32)2 =

170/32 = 0.407 . . . . But it is not actually nearest this point in the
metric  · −7 and in fact in this metric the distance between these two
points is (6, 1) − (191/32, 45/32)−7 = (1/32, −13/32)−7 = |(1/32)2 +
7(−13/32)2| = 37/32 > 1, so this point will not do. Instead, we choose
the lattice point (11/2, 3/2) representing (11 + 3j)/2. This point is appar-
ently further away, as in the usual metric on√the plane its distance from
the original point is (15/32)2 + (3/32)2 = 234/32 = 0.478 . . . . But in
the metric  · −7 this new point is actually nearest to the original point.
(We check that in this metric the distance between this point and the orig-
inal point is (11/2, 3/2) − (191/32, 45/32)−7 = (−15/32, 3/32)−7 =
|(−15/32)2 + 7(3/32)2| = 288/1024 < 1, as we expect.)
In the second step, (5+j)/3 = (5/3)+(1/3)j = 1.66 . . .+0.33 . . . j. This
corresponds to the point (1.66 . . . , 0.33 . . .) in the plane, and this point is
apparently nearest the lattice point (3/2, 1/2) representing (3 + j)/2. But
again what matters is distance in the metric  · −7 . As it happens, in
this case this same point is actually nearest to the original point, and we
choose it. (Again we check that the actual distance between this point
and the original point is (3/2, 1/2) − (5/3, 1/3)−7 = (−1/6, 1/6)−7 =
|(−1/6)2 + 7(1/6)2 | = 8/36 < 1, as we expect.)
In the third step, 3/((1 − j)/2) = (3/4) + (3/4)j = 0.75 + 0.75j. This
corresponds to the point (0.75, 0.75) in the plane, and this point is appar-
ently, and also actually, equidistant from the points (0.5, 0.5) and (1, 1),

i i

i i
i i

i i

C.3. Exercises 257

which are the apparently, and also the actually, nearest lattice points. We
may choose either one of them, and we choose the point (1, 1) represent-
ing 1 + j. (Once again we check that the actual distance between this
point and the original point is (1, 1) − (3/4, 3/4)−7 = (1/4, 1/4)−7 =
|(1/4)2 + 7(1/4)2 | = 8/16 < 1, as we expect.)

C.3 Exercises
Exercise C.1. Fill in the details of the proof of Theorem 2.8:

(a) in the case D = −7;

(b) in the case D = −11;

(c) in the case D = 5;

(d) in the case D = 13;

(e) in the case D = 17;

(f) in the case D = 21;

(g) in the case D = 19;

(h) in the case D = 11.


√ √
Exercise C.2. Let R = O( −7). Set j = −7. In each case, find a gcd
of the following sets of elements of R, and express that gcd as a linear
combination of those elements:

(a) {2 + j, 13 + j};

(b) {5 + 2j, 4 + j};

(c) {14 + j, 3 + j}.

i i

i i

You might also like