Notes 01
Notes 01
to
Computer Algebra
K. Kalorkoti
School of Informatics
University of Edinburgh
Informatics Forum
10 Crichton Street
Edinburgh EH8 9AB
Scotland
E-mail: [email protected]
January 2019
Preface
These notes are for the fourth year and MSc course in Computer Algebra. They contain more
material than can fit into the time available. The intention is to allow you to browse through extra
material with relative ease. If you have time, you are strongly encouraged to follow in detail at least
some of the extra material. However it will not be assumed that you have done this and success in
the course does not depend on it. The course itself will cover a selection of the topics mentioned,
even so some details will be omitted. For example Gröbner bases are given quite a full treatment. In
the course itself more emphasis will be placed on understanding their use, the basic algorithm and
the intuitive explanation of how and why they work. For other topics we will look at the details quite
closely. Broadly speaking, the examinable parts are those covered in the lectures or assigned for
home reading (the lecture log at the course web page, http://www.inf.ed.ac.uk/teaching/courses/ca,
will state the parts of the notes covered). A guide to revision will be handed out at the end of the
course (and placed on the course web site) stating exactly which topics are examinable.
You will find a large number of exercises in the notes. While they are not part of the formal
coursework assessment (separate exercise sheets will be issued for this) you are strongly encouraged
to try some of them. At the very least you should read and understand each exercise. You are
welcome to hand me any of your attempts for feedback; we will in any case discuss some of the
exercises in lectures. As a general rule the less familiar you are with a topic the more exercises
you should attempt from the relevant section. One or more of these exercises will be suggested at
the end of some lectures for you to try. There will be a follow up discussion at the next lecture
to provide you with feedback on your attempts. If you find any errors or obscure passages in the
notes please let me know.
Parts of these notes are based on lectures given by Dr. Franz Winkler at the Summer School in
Computer Algebra which was held at RISC-LINZ, Johannes Kepler University, Austria, during the
first two weeks of July, 1990. I am grateful for his permission to use the material.
Note: These notes and all other handouts in the course are for your private study and must not be
communicated to others in any way; breaking this requirement constitutes academic misconduct.
He liked sums,
but not the way they were taught.
— BECKETT, Malone Dies
i
Contents
1 Introduction 1
1.1 General introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Features of Computer Algebra Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Syntax of Associated Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Data Sructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
iii
6 Gröbner Bases 59
6.1 Basics of Algebraic Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 Definition and Characterization of Gröbner Bases . . . . . . . . . . . . . . . . . . . . 68
6.4 Computation of Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.5 Applications of Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.6 Improvements of the Basic Gröbner Basis Algorithm . . . . . . . . . . . . . . . . . . 80
6.7 Complexity of Computing Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . 81
6.7.1 Algebraic Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.7.2 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.8 The Case of Two Indeterminates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8 Bibilography 119
iv
1 Introduction
1.1 General introduction
Consider the expression
It might be better to simplify f as much as possible before operating on it. Here the word ‘sim-
plify’ denotes the removal of common factors from numerator and denominator. For example the
expression
x4 − 1
x2 + 1
simplifies to
x2 − 1
since x4 − 1 = (x2 − 1)(x2 + 1) (we discuss this point in greater detail in §4.7.8). This process is
rather like that of reducing a fraction a/b to its lowest terms by cancelling out the greatest common
divisor of a and b. In fact it can be seen that the numerator in the right hand side of (1) is equal to
1
As another example suppose that we want to integrate
x2 − 5
g= .
x(x − 1)4
A little experience shows that such problems can often be solved by decomposing the expression
into partial fractions. (We say that a fraction c/pq is decomposed into partial fractions if we can
write it as a/p + b/q. The summands might themselves be split further into partial fractions. Note
that such a decomposition is not always possible, e.g., 1/x2 cannot be so decomposed.) In our case
we have
−5 5 5 6 4
g= + − 2
+ 3
− .
x x − 1 (x − 1) (x − 1) (x − 1)4
We may now proceed to integrate each of the summands and add the results to obtain the integral
of g. Once again the decomposition process is rather tedious and we could easily make a mistake—
the same applies to the final process of integration! It is therefore highly desirable to have machine
assistance for such tasks. Indeed it would be very useful to be able to deal with more general
expressions such as
x+a
x(x − b)(x2 + c)
where a, b, c are arbitrary unspecified constants. More ambitiously we might ask about the possi-
bility of integrating expressions drawn from a large natural class. One of the major achievements
of Computer Algebra has been the development of an algorithm that will take an expression from
such a class and either return its integral or inform the user that no integral exists within the class.
There are many more applications of Computer Algebra some of which will be discussed as the
course progresses. The overall aim is the development of algorithms (and systems implementing
them) for Mathematical problem solving. As the examples given above illustrate, we do not restrict
attention to numerical problems.
Interactive use
The user can use the system directly from the terminal and see results displayed on the screen. On
old fashioned character terminals the results were shown in a basic 2-dimensional format; this can
still be useful if a system is being used remotely over a slow connection. Systems such as Maple
will produce a much more sophisticated display on modern terminals; Axiom does not yet have a
sophisticated interface.
2
File handling
Expressions and commands can be read from files. This is clearly essential for the development of
programs in the system’s language. Output can also be sent to a file. The output can be in various
formats or in a form which is suitable for input back to the system at some later stage.
Polynomial manipulation
Without this ability a system cannot be deemed to be a Computer Algebra system. Polynomials
are expressions such as
4x2 + 2x − 3
or
x4 − x3 y + 5xy 3 − 20y.
Systems can also handle operations on rational expressions (i.e., expressions of the form p/q where
p, q are both polynomials).
Arithmetic
Integer and rational arithmetic can be carried out to any degree of accuracy. (Some systems
impose an upper limit but this is extremely large. Naturally the available physical storage will
always impose a limit.) In floating point arithmetic the user can specify very high levels of accuracy.
Normally systems will use exact arithmetic rather than resort to floating point approximations. For
example 1/3 will not be converted to an approximation such as 0.33333 unless the user demands
p √
it. Similar remarks apply to such expressions as 9 + 4 2. If approximations are used then not
only do we lose precision but might also obtain false results. For example consider the polynomials
(taken from [21])
p = x3 − 8
q = (1/3)x2 − (4/3)
Now it can be seen that the polynomial
d = (1/3)x − (2/3)
3
and moreover no polynomial of degree higher than that of d has this property, i.e., d is a gcd of
p, q. Suppose however that we consider the polynomial
q̂ = 0.333333x2 − 1.33333
Differentiation
This can be applied to expressions involving polynomials and special functions which are recognized
by the system. Partial derivatives can also be computed.
Integration
Expressions involving polynomials and special functions can usually be integrated. The expressions
can be quite complicated—the system does much more than just look the result up in a table of
standard integrals. Integration is one of the most difficult tasks carried out by a Computer Algebra
system. All systems have some problems with definite integration and occasionally produce wrong
answers (indefinite integration is easier). It is important to understand that Computer Algebra
systems treat integration from an algebraic rather than an analytic viewpoint. For most of the time
there is no difference but for certain situations the answer returned is correct only in the algebraic
sense (e.g., the integral of a continuous function might be returned as a non continuous one). This
fact is not appreciated as widely as it ought to be.
Solving equations
Systems of simultaneous linear equations with numeric or symbolic coefficients can be solved exactly.
Arbitrary polynomial equations up to degree 4 can also be solved exactly. For higher degree
polynomials no general method is possible (this is a mathematical theorem), however certain types
of equations can still be handled.
First and second order differential equations can usually be solved as well.
Substitution
It is very useful to be able to substitute an expression in place of a given variable. This allows the
user to build up large expressions or alter existing ones in a controlled manner. All systems have
this facility.
Matrices
All systems provide basic operations on matrices. The entries of the matrices do not have to be
numerical or even specified. Systems provide methods for initializing large matrices or for building
special types such as symmetric matrices.
4
Mathematical structures
Sophisticated systems such as Axiom enable the user to work (and indeed define for themselves)
mathematical structures of many kinds. Thus a user can work with groups, rings, fields, algebras
etc. The underlying system supplies the appropriate operations on elements and many algorithms
for examining the structure of particular examples thus making it easy to test conjectures of many
kinds.
Graphics
Most systems can produce simple plots. With increasing computational power and better displays,
most systems have introduced highly developed capabilities for plotting mathematical functions
in 2 or 3 dimensions (in black and white or colour and specified illumination for 3D).
5
source being available at https://github.com/daly/axiom. For more details see http://www.axiom-
developer.org and the introduction to the free book at http://www.axiom-developer.org/axiom-
website/bookvol1.pdf (which you should download). In fact there are now at least two open source
versions of Axiom, this course will follow the one given at the preceding links.
declares the variable D to take on values that are lists of records with two entries dereferenced by
key (a string) and val (a polynomial in x, y, z with rational coefficients). So if we set
D:=[["parabola", (x/a)^2+(y/b)^2-1],["sphere",x^2+y^2+z^2-1]]
then D.2.key returns "sphere" while D.2.val returns x2 + y 2 + z 2 − 1. Compare, by contrast, the
session
(1) -> P:UP(x,Table(Integer,String))
P:UP(x,Table(Integer,String))
1) ->
UnivariatePolynomial(x,Table(Integer,String)) is not a valid type.
(1) -> P:UP(x,Integer)
P:UP(x,Integer)
Type: Void
Axiom detects that the first type declaration does not make sense and rejects it. The second one
is meaningful so it is accepted.
For a more extended discussion see the chapter A Technical Introduction to AXIOM in the
free book http://www.axiom-developer.org/axiom-website/bookvol1.pdf. One point to note is that
the example under 1.4 Operations Can Refer To Abstract Types should not be typed into Axiom
directly; it seems to be in Axiom’s language for developers but this is not made clear.
1 These should not be confused with the special types of rings also called domains, that we will meet later on.
2 These should not be confused with the objects of study in Category Theory, a branch of mathematics.
6
2.4 Basic Language Features
Like other computer algebra systems Axiom has its own programming language. Most of its con-
structs will be familiar from other imperative languages, e.g., loops, conditional statements, recur-
sion. The format of loops is quite flexible allowing for convenient expression of ideas. Here are
some examples:
(1) -> for i in 1..5 repeat print(i^i)
1->
4
27
256
3125
Type: Void
(2) -> L:=[1,2,3,4,5]
L:=[1,2,3,4,5]
(2) ->
(2) [1,2,3,4,5]
Type: List(PositiveInteger)
(3) -> for i in L repeat print(i^i)
for i in L repeat print(i^i)
1->
4
27
256
3125
Type: Void
(4) -> L:=[i for i in 1..10]
L:=[i for i in 1..10]
(4) ->
(4) [1,2,3,4,5,6,7,8,9,10]
Type: List(PositiveInteger)
(5) -> for i in L repeat print(i^i)
for i in L repeat print(i^i)
1->
4
27
256
3125
46656
823543
16777216
387420489
10000000000
Type: Void
In the second and third examples L is a list which we create in two different ways (lines (2) and (4)).
There are many data structures built in with supporting operations.
7
As can be seen from the session above Axiom is a typed language. Indeed it was an early adopter
of the notion of inheritance so that operations common to various types can become more specialised
as the data type does so. However the user is not forced to declare the type of everything, the system
will try to deduce the information, thus freeing users to work fluently. At times the system is unable
to deduce the type (either because of insufficient information or it is too complicated); if so the user
is informed and has the opportunity to supply the information. Here is an example of a function
definition from Axiom:
(1) -> f(0)==0
f(0)==0
Type: Void
(2) -> f(1)==1
f(1)==1
Type: Void
(3) -> f(n)==f(n-1)+f(n-2)
f(n)==f(n-1)+f(n-2)
Type: Void
(4) -> f(10)
f(10)
Compiling function f with type Integer -> NonNegativeInteger
Compiling function f as a recurrence relation.
^
... (a lot of messages deleted [KK])
(4) 55
Type: PositiveInteger
(5) -> f(1000)
f(1000)
(5) ->
(5)
4346655768693745643568852767504062580256466051737178040248172908953655541794_
905189040387984007925516929592259308032263477520968962323987332247116164299_
6440906533187938298969649928516003704476137795166849228875
Type: PositiveInteger
(6) ->
Clearly f is the Fibonacci function. We define it piecewise just as would be done in a maths book.
The sign == is delayed assignment, in effect it tells Axiom not to evaluate anything at this time but
to use the right hand side when required (it does carry out various checks, e.g., that types make
sense). On the first call Axiom gathers together the piecewise definition of f and tries to compile it,
in this case successfully. During compilation it carries out some optimisations so that in this case
computing f is done efficiently (a naive version would involve exponentially many recursive calls).
If the function cannot be compiled (e.g., the system cannot determine the types of the arguments)
it is interpreted instead.
Finally, Axiom makes polymorphic functions available so that the same code will be compiled
as appropriate for different data types. Here is a simple example:
(1) -> first(L)==L.1
first(L)==L.1
8
Type: Void
(2) -> first([1,2,3])
first([1,2,3])
Compiling function first with type List(PositiveInteger) ->
PositiveInteger
9
is concerned with the core algorithms of the area. Unfortunately it is very expensive, however
there is a copy in the library.
4. R. Zippel, Effective Polynomial Computation, Kluwer Academic Publishers (1993).
This book focuses on the core polynomial operations in Computer Algebra and studies them
in depth. It is a good place to go to for extra detail (both theoretical and practical). Unfor-
tunately it has a fairly large number of misprints most of which are minor (a list is available
from K. Kalorkoti).
5. D. E. Knuth, Seminumerical Algorithms, (Second Edition), Addison-Wesley (1981).
Chapter 4 gives a comprehensive treatment of topics in arithmetic and some aspects of poly-
nomial arithmetic. Although the course will not go into such staggering detail every serious
student of computing should consult this book. Several copies are available from the library.
6. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms.
McGraw-Hill, 2002 (now in its third edition, published September 2009).
This book is not specifically related to Computer Algebra, however it is an excellent source
for many algorithms and general background. There are also several chapters on basic Math-
ematics.
7. D. R. Stoutemyer, Crimes and misdemeanors in the computer algebra trade, Notices of the
AMS, Sept. 1991, 701-705.
This paper describes some of the ways in which computer algebra systems can produce wrong
results. Ideally it (or something like it) should be read by every user of such systems. A
crude but effective summing up of some of the paper is that having a computer does not give
you the right to throw your brain away—quite the opposite is true. The paper concentrates
on analysis rather than algebra, surprisingly the author does not discuss newer systems such
as Scratchpad II (now called AXIOM) which try to be more sensible about the domain of
computation. (In this course we will be concerned almost exclusively with the algebraic
viewpoint.)
8. F. Winkler, Polynomial Algorithms in Computer Algebra, Springer (1996).
This book is devoted to algebraic aspects of the subject, ranging from basic topics to algebraic
curves.
9. J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cambridge University Press
(1999).
This book gives comprehensive coverage to the basics of the subject, aiming at a complete pre-
sentation of mathematical underpinnings, analysis of algorithms and development of asymp-
totically fast methods.
10
• Macsyma: started at MIT under the direction of J. Moses. A very large system. A major
drawback is that it has had too many implementors with no common style and there is no
source code documentation. It is therefore difficult to build on the system using the source
code. Macsyma was marketed by Symbolics Inc. for many years. It is now available as
Maxima under the GPL license.
• REDUCE: started at the University of Utah under the direction of A. Hearn later of the Rand
Corporation, Santa Monica. It has many special functions for Physics and is now available
under a BSD license.
• ALDES/SAC-II (then SACLIB): under the direction of G. E. Collins, University of Wisconsin,
Maddison (later at RISC, University of Linz). Originally implemented in Fortran and now in
C, the user languages is called ALDES. The system is not very user friendly, it is intended
for research in Computer Algebra, many basic algorithms were implemented in it for the first
time. It is still being developed.
• muMath/Derive: developed specifically for micros with limited memory by Soft Warehouse
later owned by Texas Instruments. Derive was menu driven and replaced muMath. It was
discontinued in June 2007.
• Maple: started at the University of Waterloo, Canada, in the early 1980’s. It is a powerful
and compact system which is ideal for multi-user environments and is still under very active
development.
• Mathematica: this is a product of Wolfram Research Inc., based in Ilinois, and is the successor
to SMP. It was announced with a great deal of publicity in Summer 1988. One very striking
feature at the time was its ability to produce very high quality graphics. Its programming
language attempts to mix just about all possible paradigms, this does not seem like a good
idea to me. It is still been developed.
• Sage: this is not strictly speaking a computer algebra system but a wrapper for various free
systems. The front page of the web site (http://www.sagemath.org) states ‘SageMath is a free
open-source mathematics software system licensed under the GPL. It builds on top of many
existing open-source packages: NumPy, SciPy, matplotlib, Sympy, Maxima, GAP, FLINT, R
and many more. Access their combined power through a common, Python-based language or
directly via interfaces or wrappers.’
There are various other more specialised systems, e.g., GAP and Magma for group theory (GAP is
available through Sage).
11
4 Basic Structures and Algorithms
4.1 Algebraic Structures
We describe some abstract algebraic structures which will be useful in defining various concepts.
At first they appear rather dry and daunting but they have their origins in concrete applications
(for example groups and fields were shown by Galois to have a deep connection with the problem
of solving polynomial equations in radicals when the concepts were still just being formed—see
Stewart [57]). We shall use the following standard notation:
1. Z, the integers,
2. Q, the rationals,
3. R, the reals,
4. C, the complex numbers,
x ◦ y = y ◦ x, for all x, y ∈ R.
Thus the addition of numbers is both commutative and associative, while subtraction is neither.
On the other hand matrix multiplication is associative but not commutative (except in the case of
1 × 1 matrices with entries whose multiplication is commutative).
Associative operations make notation very easy because it can be shown that any valid way of
bracketing the expression
x1 ◦ x2 ◦ · · · ◦ xn
leads to the same result. (The proof of this makes a useful exercise in induction.)
A ring is a set R equipped with two binary operations +, ∗ called addition and multiplication
with the following properties:
1. + is associative,
2. + is commutative,
12
5. ∗ is associative,
6. for all x, y, z ∈ R we have
x ∗ (y + z) = x ∗ y + x ∗ z,
(x + y) ∗ z = x ∗ z + y ∗ z,
i.e., ∗ is both left and right distributive over +.
It is important to bear in mind that R need not be a set of numbers and even if it is the two binary
operations might be different from the usual addition and multiplication of numbers (in which case
we use symbols which are different from + and ∗ in order to avoid confusion). The element 0 is
easily seen to be unique and is called the zero of the ring. We prove this claim by the usual method:
suppose there are two elements 0, 00 that satisfy the axioms for zero. Then
00 = 00 + 0, by axiom 3
0
=0+0, by axiom 2
= 0, since 00 satisfies axiom 3 by assumption.
Thus 00 = 0 as claimed. Moreover every x has exactly one additive inverse which is denoted by −x.
We normally write x − y instead of the more cumbersome x + (−y). The axioms immediately imply
certain standard identities such as x ∗ 0 = 0 for all x. To see this note that x ∗ 0 = x ∗ (0 + 0) =
x ∗ 0 + x ∗ 0. Now adding −(x ∗ 0) to both sides we obtain 0 = x ∗ 0 as required. Similarly 0 ∗ x = 0
for all x (remember that ∗ is not assumed to be commutative so this second identity does not follow
immediately from the first).
It is worth noting that multiplication in rings is frequently denoted by juxtaposition. Thus we
write xy rather than x ∗ y.
The archetypal ring is Z with the usual addition and multiplication. Other examples include:
1. Q, R, C with the usual addition and multiplication.
2. 2 Z the set of all even integers with the usual addition and multiplication.
3. Zn the integers modulo n where n > 1 is a natural number. Here addition and multiplication
are carried out as normal but we then take as result the remainder after division by n.
Alternatively we view the elements of Zn to be the integers and use ordinary addition and
multiplication but interpret equality to mean that the two numbers involved have the same
remainder after division by n. More accurately the elements are equivalence classes where two
numbers are said to be equivalent if they have the same remainder when divided by n (this
is the same as saying that their difference is divisible by n). The classes are {kn + r | k ∈ Z},
for 0 ≤ r ≤ n − 1. For simplicity we denote such a class by r. We could use any other
element of each class to represent it. A simple exercise shows that doing operations based on
representatives is unambiguous, that is we always get the same underlying equivalence class.
We will see later on that this is a particular case of a more general construction that gives us
a new ring from the ingredients of an existing one. Just to give a hint of this, note that the
set n Z = {nm | m ∈ Z} is itself a subring of Z (actually all we need is that it is a two sided
ideal but this will have to wait). Note that n can be arbitrary for this construction. Now if
we define the relation ∼ on Z by
a ∼ b ⇔ a − b ∈ nZ
13
we obtain an equivalence relation and use Z /n Z to denote the equivalence classes. Let us
denote the equivalence class of an integer r by [r]; note that by definition [r] = {mn + r | m ∈
Z}.We can turn Z /n Z into a ring by defining + and ∗ on equivalence classes by
There is a subtlety here, we must show that the operations are well defined, i.e., if [r1 ] = [r2 ]
and [s1 ] = [s2 ] then [r1 ] + [s1 ] = [r2 ] + [s2 ] and [r1 ] ∗ [s1 ] = [r2 ] ∗ [s2 ]. This is quite easy to do
but we will leave it for the general case (this is where we need the substructure n Z to be a
subring, in fact just a two sided ideal).
The use of square brackets to denote equivalence classes is helpful at first but soon becomes
tedious. In practice we drop the brackets and understand from the context that an equivalence
class is being denoted. This is an important convention to get used to; the meaning of a piece
of notation depends on the structure in which we are working. Thus if we are working over
the integers then 3 denotes the familiar integer (“three”) but if we are working in the integers
modulo 6 then it denotes the set of integers {6m + 3 | m ∈ Z}. Finally, we have used Zn as an
abbreviation for Z /n Z. This is quite standard but in more advanced algebra the notation is
not so good because it conflicts with the notation of another very useful concept (localization,
since you ask).
4. Square matrices of a fixed size with integer entries. Here we use the normal operations of
matrix addition and matrix multiplication.
Again this is a particular case of a general construction, we can form a ring by considering
square matrices with entries from any given ring.
5. Let S be any set and P = P(S) the power set of S (i.e., the set of all subsets of S). This
forms a ring where
• Addition is symmetric difference, i.e., A + B is A ∪ B − A ∩ B (this consists of al elements
that are in one of the sets but not both).
• Multiplication is intersection, i.e., A ∗ B is A ∩ B.
The empty set plays the role of 0. This is an example of a Boolean ring, i.e. x ∗ x = x for
all x.
Note that in all our examples, except for matrices, multiplication is also commutative. A ring whose
multiplication is commutative is called commutative. Notice further that in all our examples, except
for the ring 2 Z (generally n Z for n > 1), we have a special element 1 with the property that
1x = x1 = x, for all x ∈ R.
An element with this property is called a (multiplicative) identity of the ring. In fact it is easily
seen that if such an identity exists then it is unique. (There is a clear analogy with 0 here which is
an additive identity for the ring—however the existence of 0 in a ring is required by the axioms).
While the definition of rings is clearly motivated by the integers, certain ‘strange’ things can
happen. For example in Z6 we have 2 × 3 = 0 even though 2 6= 0 and 3 6= 0 in Z6 (i.e., 6 does not
divide 2 or 3). Matrix rings also exhibit such behaviour.
14
Pursuing the numerical analogy one level up we see that in the ring Q every non-zero element
has a multiplicative inverse, i.e., for each x 6= 0 there is a y ∈ Q such that xy = yx = 1. Of
course in general the notion of a multiplicative inverse only makes sense if the ring has an identity.
However even if the ring does have an identity there is no guarantee that particular elements will
have inverses. For example in the ring of 2 × 2 matrices with integer entries we observe that
1 0
0 0
Observations such as the above lead to the notion of fields. A field is a ring with the extra properties
1. there is a multiplicative identity which is different from 0,
2. multiplication is commutative,
Exercise 4.2 Show that Zn is not a field whenever n is not a prime number. Treat n = 1 as a
special case and then consider composite n for n > 1. (Hint: in a field there cannot be non-zero
elements x, y such that xy = 0, as shown above.)
Exercise 4.3 We have seen that in certain fields it is possible to obtain 0 by adding 1 to itself
sufficiently many times. The characteristic of a field is defined to be 0 if we can never obtain 0 in
the way described and otherwise it is the least number of times we have to add 1 to itself in order
to obtain 0. Show that every non-zero characteristic is a prime number.
15
Fields bring us closer to number systems such as Q and R rather than Z. One reason for this is
because in a field every non-zero element must have an inverse. We can define other types of rings
which don’t go this far. An integral domain (or ID) is a commutative ring with identity (different
from zero) with the property that if xy = 0 then x = 0 or y = 0. (Of course every field is an ID but
not conversely.) A still more specialized class of rings consists of the unique factorization domains
(or UFD’s). We shall not give a formal definition of these but explain the idea behind them. It is
well known that every non-zero integer n can be factorized as a product of prime numbers:
n = p1 p2 · · · ps .
(Here we allow the possibility of negative ‘primes’ such as −3.) Moreover if we have another
factorization of n into primes
n = q1 q2 · · · qt ,
then
1. s = t, and
2. there is a permutation π of 1, 2, 3, . . . , s such that pi = i qπ(i) for 1 ≤ i ≤ n where i = ±1.
Note that ±1 are the only invertible elements of Z. The notion of a prime or rather irreducible
element can be defined for ID’s: a non-zero element a of an ID is irreducible if it does not have an
inverse and whenever a = bc then either b or c has a multiplicative inverse.
Note that if u is an invertible element of a ring and a is any element of the ring then we have
a = u(u−1 a) which is a trivial ‘factorization.’ Such a ‘factorization’ tells us nothing about a. For
example writing 6 = −1 × −6 contains no information about 6. Writing 6 = 2 × 3 gives us genuine
information about 6 since neither factor is invertible in Z and so the factorization is not automatic.
This discussion also shows that factorization is not an interesting concept for fields; every non-zero
element is invertible. You have known this for a long time though perhaps not in this abstract
setting; you would never consider if a rational or a real number has a factorization.
A UFD is an ID in which every non-zero element has a factorization in finitely many irreducible
elements and this factorization is unique in a sense similar to that for integer factorization. Ob-
viously Z is a UFD. Also every field is a UFD for the trivial reason that in a field there are no
primes (every non-zero element has an inverse) so that each non-zero element factorizes uniquely
as its (invertible) self times the empty product of irreducible elements! In a UFD we have a very
important property. Let us say that b divides a, written as b | a, if a = bc for some c. It can be
shown that if p is irreducible and p | ab then either p | a or p | b (in fact this property is usually taken
as the definition of a prime element in a ring—the observation then says that in a UFD irreducible
elements are primes, see Exercise 4.4 for an example of a ring in which irreducible elements need
not be primes).
16
√ √ √
1. Show that 3, 2 + −5 and 2 −−5 are all irreducible elements of Z[ −5].
√ √
So for this part you need to show that, e.g., if 3 = (a1 + b1 −5)(a2 + b2 −5) then one
of the√factors is √
invertible. You can make life easier
√ by noting two
√ things: firstly we have
(a + b −5)(a − b −5) = a2 + 5b2 . Secondly √ a1 + b1 −5 = a2 + b2 −5 if and only if a1 = a2
and b1 = b2 , this follows from the fact that −5 is not a real number (all we need is that it
is not rational).
√ √ √ √
2. Observe that 3 × 3 = (2 + √ −5)(2 − −5) √ so that 3 | (2 + −5)(2 − −5). Show that 3
√ not divide either of 2 + −5 or 2 − −5 (remember that everything takes place inside
does
Z[ −5]).
√
3. Deduce that the ID Z[ −5] is not a UFD.
17
elements is an invertible element of D then we will normalize it to be 1, the multiplicative identity
(of course this is fine since if u is a gcd of a, b and u is invertible then uu−1 = 1 is also a gcd of a, b).
In general we say that a, b are coprime (or relatively prime) if gcd(a, b) = 1. It is easy to show
that if a, b are coprime and b | ac then b | c. Under the same assumption, if a | c and b | c then ab | c.
We shall discuss the notion of gcd’s in more detail for those cases of particular interest to us. In
each of these cases it is possible to give an alternative equivalent definition but it is useful to know
that a unified treatment is possible.
4.5 Integers
In representing non-negative integers we fix a base and then hold the digits either in a linked list
or an array (for the latter option we need a language such as C that enables us to grow arrays
at runtime). Leading zeros are suppressed (i.e., 011 loses the 0) in order to have a canonical
representation (we need to have a special case for 0 itself). The base chosen is usually as large as
possible subject to being able to fit the digits into a word of memory and leave one bit to indicate
a carry in basic operations (otherwise the carry has to be retrieved in ways that are too machine
dependent). Usually the base is a power of 2 or of 10. Powers of 2 make some operations more
efficient while powers of 10 make the input and output of integers more efficient.
Knuth [39] gives a great deal of detail on algorithms for the basic arithmetic operations on large
integers. Here we look only at multiplication. The classical method uses around n2 multiplications
of digits and n additions so its cost is proportional to n2 . We can do better than this by using
Karatsuba’s algorithm [36]. For simplicity we consider two integers x, y represented in base B
18
both of length n and put
x = aB n/2 + b,
y = cB n/2 + d,
(with appropriate adjustment if n is odd). Then
Instead of doing one multiplication of integers of length n (costing ∼ n2 time we do four multipli-
cations of integers of length n/2 (which costs ∼ 4(n/2)2 = n2 ). But we don’t have to compute the
four products since
bc + ad = (a + b)(c + d) − ac − bd
and so we only need three products of integers of length n/2 (or possibly n/2 + 1) and then some
shifts and additions (costing ∼ n time). Using the algorithm recursively we find that the time T (n)
taken to multiply two integers of length n is given by the recurrence
Θ(1), if n = 1,
T (n) =
3T (n/2) + Θ(n), if n > 1.
4.6 Fractions
These can be represented in any structure that can hold two integer structures (or pointers to
them). Thus a fraction a/b could be denoted generically as ha, bi. We normally insist on the
following conditions:
1. The second integer is positive so the sign of the fraction is determined by that of a.
2. The gcd of the two integers is 1.
19
The gcd of a, b is undefined if both a, b are 0 and is the largest positive integer d that divides
both a and b otherwise (d always exists—why?). This method gives a canonical form and has the
advantage of keeping the size of the two integers as small as possible. We discuss the computation
of gcd’s in §4.6.1.
Carrying out arithmetic on rational numbers is straightforward but even here a little thought
can improve efficiency. Let a/b and c/d be fractions in their lowest terms with b, c > 0. Then
and the last expression is in canonical form. However the fact that gcd(a, b) = gcd(c, d) = 1 means
that
gcd(ac, bd) = gcd(a, d) gcd(b, c).
If we put
d1 = gcd(a, d)
d2 = gcd(b, c)
then
(a/d1 )(c/d2 )
(b/d2 )(d/d1 )
gives us the canonical form for a/b. This method requires two gcd computations. However this is
not slower than the previous method. We will see in §4.6.1 that the number of ‘basic steps’ of the
usual algorithm for computing gcd’s is essentially proportional to the logarithm of the largest of
its inputs. Thus the number of basic steps needed to compute d1 and d2 is roughly the same as is
required to compute gcd(ac, bd). Each basic step operates on numbers whose size decreases with
each step. It follows that the second method is potentially faster because it works with smaller
numbers than the first.
Division is essentially identical to multiplication.
For addition and subtraction let
a c p
± =
b d q
where p/q is in canonical form. Generally it is best to compute p0 , q 0 by
d b
p0 = a +c
gcd(b, d) gcd(b, d)
bd
q0 =
gcd(b, d)
and then
p = p0 / gcd(p0 , q 0 ), q = q 0 / gcd(p0 , q 0 ).
Exercise 4.5 Computing q 0 by u := bd, v := gcd(b, d), q 0 = u/v is a bad idea. Suggest a better
method (we assume that gcd(b, d) is already known).
20
4.6.1 Euclid’s Algorithm for the Integers
Recall that the gcd of a, b is 0 if both a, b are 0 and is the largest positive integer d that divides
both a and b otherwise. We will now assume that at least one of a, b is non-zero. We note the
following obvious properties of the gcd function.
1. gcd(a, b) = gcd(b, a).
2. gcd(a, b) = gcd(|a|, |b|).
3. gcd(0, b) = |b|.
4. gcd(a, b) = gcd(a − b, b).
Exercise 4.6 Prove each of the preceding properties of gcd’s.
From now on we shall assume that a, b are non-negative. The various equations suggest the following
(impractical) method of finding gcd(a, b).
1. If a = 0 then return b.
2. If a < b then return gcd(b, a).
3. Otherwise return gcd(a − b, b).
Note that this process always halts with a = 0 (why?). There is one obvious improvement that is
suggested by the fact that
gcd(a, b) = gcd(a − b, b) = gcd(a − 2b, b) = . . . = gcd(a − qb, b)
for any integer q. In particular if b > 0 and we put
a = qb + r
where 0 ≤ r < b and q is an integer then
gcd(a, b) = gcd(b, r).
We usually call q the quotient of a, b and r the remainder or residue. This frequently reduces the
number of subtractions we carry out at the expense of introducing an integer division. Thus we
may find gcd(9, 24) = 3 by
24 = 2 × 9 + 6
9=1×6+3
6=2×3+0
In general we put r0 = a, r1 = b and write
r0 = q1 r1 + r2
r1 = q2 r2 + r3
r2 = q3 r3 + r4
..
.
rs−2 = qs−1 rs−1 + rs
rs−1 = qs rs + rs+1
21
where rs+1 = 0 and 0 ≤ ri < ri−1 for 1 ≤ i ≤ s + 1. Note that we must eventually have ri = 0 for
some i since r0 > r1 > . . . > rs > rs+1 ≥ 0. Furthermore since
gcd(a, b) = gcd(r0 , r1 )
= gcd(r1 , r2 )
..
.
= gcd(rs−2 , rs−1 )
= gcd(rs−1 , rs )
= gcd(rs , rs+1 )
= gcd(rs , 0)
= rs
it follows that gcd(a, b) is just the last non-zero remainder. This is the most common form of
Euclid’s algorithm as this process is known3 . (If b = 0 then the algorithm stops straight away and
gcd(a, b) = r0 = a.)
Exercise 4.7 What happens if we use Euclid’s algorithm with a < b? Try an example.
Exercise 4.8 Assume that b > 0. Prove that if a = qb + r where 0 ≤ r < b and q is an integer
then gcd(a, b) = gcd(b, r). (Hint: Show that the set of divisors of a and b is exactly the same as the
set of divisors of b and r.)
Suppose we apply Euclid’s algorithm and arrive at the last non-zero remainder rs . We may then
rewrite the last step as
rs = rs−2 − qs−1 rs−1 .
The remainder rs−1 can be written as
so that
rs = −qs−1 rs−3 + (1 + qs−1 qs−2 )rs−2 .
This process can be continued until we obtain
rs = ur0 + vr1
where u, v are integers. What we have shown is that if d = gcd(a, b) then there are integers u, v
such that
d = ua + vb.
Moreover Euclid’s algorithm enables us to compute u, v. The process of back-substitution is
unattractive from a computational point of view, but it is easy to compute u, v in a ‘forwards’
direction (i.e., along with the computation of the gcd). This version of Euclid’s Algorithm is usually
called the Extended Euclidean Algorithm. Note that u, v are not unique, e.g., (−1) · 2 + 1 · 3 =
(−4) · 2 + 3 · 3 = 1. The Extended Euclidean Algorithm simply produces one possible pair of values.
Finally note that if a = b = 0 then the equation holds trivially since the gcd is 0.
3 Given by Euclid in his Elements, Book 7, Propositions 1 and 2. This dates from about 300 B.C.
22
Exercise 4.9 Show how to compute u, v along with the gcd by updating appropriate variables.
Proof First we show that if n is not a prime then Zn is not a field. If n is not a prime then it is
either 1 or a composite number. If n = 1 then in Zn we have 0 = 1 which is not allowed in a field.
If n is composite then we have n = rs for some integers r, s with 1 < r, s < n. Thus r 6= 0 and
s 6= 0 in Zn but rs = 0 in Zn and this cannot happen in a field.
Now we show that if n is a prime then Zn is a field. We know that Zn is a commutative ring
with identity for all n. Moreover when n is a prime then 0 6= 1 in Zn (since n > 2 so that 0 and 1
are distinct remainders modulo n). Thus we need only show that every nonzero element of Zn has
a multiplicative inverse. So suppose that a is a non-zero member of Zn . It follolws that n does not
divide a and so gcd(a, n) = 1 (since n is a prime the only possible divisors are 1 and n). Thus by
the Extended Euclidean Algorithm there are integers u, v such that ua + vn = 1. Thus ua = 1 in
Zn and so u is the required multiplicative inverse. 2
Exercise 4.11 Prove that the invertible elements of Zn are precisely those a such that gcd(a, n) =
1.
Qn (x1 , x2 , . . . , xn ) = Qn (xn , . . . , x2 , x1 )
23
so that for n > 1
for all n ≥ 2. (Here the qi are the quotients of Euclid’s algorithm applied to r0 , r1 and the ri
are the remainders.)
3. Express Qn (q1 , . . . , qn ) and Qn−1 (q2 , . . . , qn ) in terms of a, b and gcd(a, b)
See Knuth [39] for more material related to the last two exercises (and Euclid’s algorithm in
general).
then we shall have a good idea of the worst-case runtime of Euclid’s algorithm. To be precise the ith
step involves us in computing qi+1 and ri+2 which consists of an integer division and a remainder.
We shall give an outline of the analysis, which turns out to be a little surprising.
Let F0 , F1 , F2 , . . . denote the Fibonacci sequence defined by
(0 if n = 0,
Fn = 1 if n = 1,
Fn−1 + Fn−2 otherwise.
24
i.e., the sequence 0, 1, 1, 2, 3, 5, . . .. These numbers have many interesting properties, e.g.,
gcd(Fm , Fn ) = Fgcd(m,n) .
Now suppose we apply Euclid’s algorithm to Fn+2 , Fn+1 . We observe the following behaviour
Fn+2 = 1 × Fn+1 + Fn
Fn+1 = 1 × Fn + Fn−1
..
.
5=1×3+2
3=1×2+1
2=2×1
and there are exactly n steps. Note how the quotients are all 1 except for the last one which is 2
(the last quotient can never be 1 in any application of Euclid’s algorithm with r0 6= r1 , can you see
why?). In other words we are making as little progress as is possible. The following result confirms
an obvious conjecture.
Theorem 4.3 (G. Lamé, 1845) For n ≥ 1, let a, b be integers with a > b > 0 such that Euclid’s
algorithm applied to a, b requires exactly n steps and such that a is as small as possible satisfying
these conditions. Then
a = Fn+2 , b = Fn+1 .
Corollary 4.1 If 0 ≤ a, b < N , where N is an integer, √ then the number of steps required
√ by Euclid’s
algorithm when it is applied to a, b is at most dlogφ ( 5N )e − 2 where φ = 21 (1 + 5), the so-called
golden ratio.
Proof Let n be the number of steps taken by Euclid’s algorithm applied to a, b. Note that the
worst possibility is to have a < b since then the first step is ‘wasted’ in swapping a and b. The
algorithm then proceeds as though it had been applied to b, a where b > a and takes n − 1 steps.
Now according to the theorem we have a = Fn , b = Fn+1 . So we want to find the largest n that
satisfies
Fn+1 < N (2)
Now it can be shown that
1
Fn = √ (φn − φ̂n ) (3)
5
√
where φ̂ = 1 − φ = 21 (1 − 5) (see Exercise 4.15 or §4.7.2). Note that φ̂ = −0.6183 . . . and φ̂n
√
approaches 0 very rapidly. In fact φ̂n / 5 is always small enough so that
φn+1
Fn+1 = √ rounded to the nearest integer.
5
√
Combining this with equation (2) we obtain n + 1 < logφ ( 5N ) which completes the proof. 2
√
It is worth noting that logφ ( 5N ) is approximately 4.785 log10 N + 1.672. For more details see
Knuth [39] or Cormen, Leiserson and Rivest [18].
25
Exercise 4.15 Let
1 1
F = .
1 0
1. Show, by induction on n, that
Fn+1 Fn
Fn =
Fn Fn−1
for all n > 0.
2. Find a matrix S such that
−1 λ1 0
SF S =
0 λ2
for some real numbers λ1 , λ2 . Note that
(SF S −1 )n = S −1 F n S,
n n
λ1 0 λ1 0
= .
0 λ2 0 λn2
4.7 Polynomials
We start by giving a fairly formal definition of polynomials. This will be helpful in understanding
the symbolic nature of Computer Algebra systems. (We have already used polynomials informally
and it is very likely that you are familiar with them. Rest assured that this section simply gives
one way of defining them. However there is a widespread misconception about polynomials—see
§4.7.1—so take care!)
Let R be a commutative ring with identity, for us it will usually be one of Z, Q, R, C or Zn .
Let x be a new symbol (i.e., x does not appear in R). Then a polynomial in the indeterminate x
with coefficients from R is an object of form
a0 + a1 x + a2 x2 + · · · + an xn + · · ·
where each ai ∈ R and all but finitely many of the ai are 0. Note that the + which appears above
does not denote addition in R (after all x is not an element of R). Indeed we could just regard
polynomials as infinite tuples
(a0 , a1 , a2 , . . .)
but the first notation is much more convenient as we shall see4 . We call a0 the constant term of
the polynomial and ai the coefficient of xi (we regard a0 as the coefficient of x0 ). The set of all
such polynomials is denoted by R[x].
4 If you are curious here is bit more detail about the connection between the the two approaches. We define
operations on the sequence notation that give us a ring. We then denote the sequence (0, 1, 0, 0, . . .) by x. From the
definition of multiplication it follows that xi = (0, . . . , 0, 1, 0, . . .) where the 1 is in position i. From the definitions
we also have that (0, . . . , 0, a, 0, . . .) = a(0, . . . , 0, 1, 0, . . .) = axi . Addition is component-wise and so it follows that
(a0 , a1 , a2 , . . .) = a0 + a1 x + a2 x2 + · · · , so we have now got back to our familiar notation. Strictly speaking this
approach is much more sound as it uses standard notions and then justifies the new notation.
26
A convenient abbreviation is the familiar sigma notation
∞
X
ai xi = a0 + a1 x + a2 x2 + · · · + an xn + · · ·
i=0
Two polynomials are equal if and only if the coefficients of corresponding powers of x are equal as
elements of R. Thus
X∞ X∞
ai xi = bi x i
i=0 i=0
if and only if a0 = b0 , a1 = b1 , a2 = b2 etc. When writing polynomials we normally leave out any
terms whose coefficient is 0. Thus we write
2 + 5x3 − 3x5
rather than
2 + 0x + 0x2 + 5x3 + 0x4 − 3x5 + 0x6 + · · ·
Moreover we can also denote the same polynomial by 5x3 − 3x5 + 2 or by −3x5 + 5x3 + 2. Thus if
ai = 0 for all i > 0 then we denote the polynomial by just a0 (such a polynomial is called constant).
The zero polynomial has 0 for all of its coefficients and is denoted by 0.
The highest power of x that has a non-zero coefficient in a polynomial p, is called the degree
of p and is denoted by deg(p). The corresponding coefficient is called the leading coefficient of p
and is sometimes denoted by lc(p). (These notions do not make sense for the zero polynomial. We
leave deg(0) and lc(0) undefined. For example
We define the sum, difference and product of polynomials in the obvious way.
∞
X ∞
X ∞
X
ai xi ± bi xi = (ai ± bi )xi ,
i=0 i=0 i=0
∞ ∞ ∞
! ! i
X X X X
ai xi bi x i = aj bi−j xi .
i=0 i=0 i=0 j=0
If m, n are the respective degrees of the two polynomials then we can write these definitions as
m n max(m,n)
X X X
ai xi ± bi xi = (ai ± bi )xi ,
i=0 i=0 i=0
! !
m
X n
X m+n
X i
X
a i xi bi xi = aj bi−j xi
i=0 i=0 i=0 j=0
where it is understood that ai is 0 if i > m and similarly bj is 0 if j > n. Note that the definition
above yields ax = xa for all a ∈ R from which it follows that every element of R commutes with
27
every polynomial. It is quite easy to show that, with these definitions, the three operations on
polynomials obey various well known (and usually tacitly assumed) rules of arithmetic such as
p + q = q + p,
pq = qp,
(p + q) + r = p + (q + r),
(pq)r = p(qr),
p(q + r) = pq + pr.
Indeed R[x] is a commutative ring with identity under the operations given above. We have
deg(p ± q) ≤ max deg(p), deg(q) ,
deg(pq) ≤ deg(p) + deg(q),
deg(pq) = deg(p) + deg(q), if lc(p) lc(q) 6= 0,
while
(2 + 3x − 4x3 ) × (−3x + 2x2 ) = (2 · 0)
+ (2 · (−3) + 3 · 0)x
+ (2 · 2 + 3 · (−3) + 0 · 0)x2
+ (2 · 0 + 3 · 2 + 0 · (−3) + (−4) · 0)x3
+ (2 · 0 + 3 · 0 + 0 · 2 + (−4) · (−3) + 0 · 0)x4
+ (2 · 0 + 3 · 0 + 0 · 0 + (−4) · 2 + 0 · (−3) + 0 · 0)x5
= −6x − 5x2 + 6x3 + 12x4 − 8x5 .
If the coefficients come from Z then the degree of the product is equal to the sum of the degrees.
On the other hand if the coefficients come from Z8 then the product polynomial is equal to 2x +
3x2 + 6x3 + 4x4 and the degree of the product is less than the sum of the degrees of the two factors.
The preceding example makes rather heavy weather of the process of multiplying polynomials.
An alternative approach is to multiply each term of the first polynomial with each one of the second
and then collect coefficients of equal powers of x. we can write this out in tabular form as follows
2 + 3x − 4x3
− 3x + 2x2
− 6x − 9x2 + 12x4
28
Of course this is just the same process as the familiar school method of multiplying integers, with
the difference that we do not need to concern ourselves with carries. Most computer algebra
systems multiply polynomials by an algorithm based on this process. (It is interesting that the
implementation of such a simple algorithm still presents a subtlety about the amount of memory
used—see [21], p. 66.)
p̂(α) = a0 + a1 α + · · · + an αn
for each α ∈ R. The preceding expression looks so much like the polynomial p that it is common
practice to identify the function and the polynomial (indeed the indeterminate x is frequently
referred to as a variable). This is potentially dangerous since, e.g., the notions of equality between
polynomials and between functions are completely different. Recall that two functions f, g : R → R
are said to be equal if and only if f (α) = g(α) for all α ∈ R. Now let p be as above and
q = b0 + b1 x + · · · + bm xm .
We have that
p = q ⇔ ai = bi , for i = 0, 1, . . .
⇔ p = q = 0, or
deg(p) = deg(q) & ai = bi , for 0 ≤ i ≤ deg(p).
On the other hand if we think of p and q as functions (i.e., identify them with p̂ and q̂) then we
have
p = q ⇔ p(α) = q(α) for all α ∈ R.
Fortunately as long as R is an infinite integral domain (such as Z or any field) then it can be shown
that the two definitions coincide and so there is no harm in confusing the polynomial p with the
corresponding function p̂. (In fact there is a deeper reason, the ring of polynomial functions is
isomorphic to the ring of polynomials.)
Exercise 4.16 Let p be a polynomial with coefficients from an integral domain R. A number α is
said to be a root of p if p(α) = 0. It can be shown that if p 6= 0 then it has at most deg(p) roots
in R. Use this fact to show that p = q if and only if p̂ = q̂ for two polynomials p and q when R is
infinite.
When R is finite the situation is completely different. Let the elements of R be r1 , . . . , rn and put
Z(x) = (x − r1 ) · · · (x − rn ).
Assume that n > 1 so that 1 6= 0 in R. Then clearly Z(x) is not the zero polynomial (it has degree n
and leading coefficient 1). However the associated polynomial function Ẑ is the zero function.
29
Exercise 4.17 This exercise shows that if R is not an integral domain then the distinction between
polynomials and the corresponding functions is important even if R is infinite.
Let R1 , R2 be two commutative rings with identities 1A and 1B respectively. Show that the
obvious definitions turn
R1 × R2 = { (r1 , r2 ) | r1 ∈ R1 , r2 ∈ R2 }
into a ring with identity (1A , 1B ) and zero (0A , 0B ). Now let a = (1A , 0B ) and deduce that every
element of the form (0A , r2 ) is a root of the polynomial ax.
(1 − x)(1 + x + x2 + x3 + · · · + xn + · · ·) = 1
so that
1
= 1 + x + x2 + x3 + · · · + xn + · · ·
1−x
(It is common practice to use 1/a to denote the inverse of an invertible element of a ring R.) These
ideas are very useful in many areas; we give an illustration from enumerative combinatorics. Let
a0 , a1 , . . . be a sequence of numbers. We define the generating function of this sequence by
A(x) = a0 + a1 x + a2 x2 + · · ·
(although A(x) is called a generating function this is just a nod to tradition and it is really a formal
power series). To illustrate the usefulness of this notion let us consider our old friends the Fibonacci
numbers F0 , F1 , F2 , . . .. The generating function for these is
F (x) = F0 + F1 x + F2 x2 + · · ·
30
Now it is clear that
F (x)(1 − x − x2 ) = x
and this is true in R[[x]]. Moreover 1 − x − x2 has a non-zero constant term and so it has an inverse
in R[[x]]. Thus
x
F (x) =
1 − x − x2
−x
=
(φ + x)(φ̂ + x)
1
√ 1
√
where φ = 2 (1 + 5) and φ̂ = 2 (1 − 5) (of course −φ and −φ̂ are the two roots of 1 − x − x2 ).
Thus, from the partial fraction decomposition of the last expression, we have
x 1 1
F (x) = √ − .
5 φ + x φ̂ + x
31
where rij ∈ R for all i, j. This helps to reveal the fact that R[x][y] and R[y][x] are essentially the
same ring (i.e. they are isomorphic). Because of this we use the notation R[x, y]. (Note that that x,
y commute, i.e., xy = yx.) When taking this latter view each product xi y j in (4) is called a power
product in x, y of degree i + j. Thus in this view the degree of (4) is undefined if all coefficients rij
are 0 and otherwise it is the maximum degree of a power product which occurs in it with non-zero
coefficient. The properties of degrees given above carry through to the new situation. We use a
phrase such as ‘degree in x’ to indicate that we are viewing a polynomial of R[x, y] as being in the
indeterminate x with coefficients from R[y], this degree is denoted by degx (·). Similar comments
apply to the leading coefficient, which is defined when we are focusing on one indeterminate as the
‘main’ one. For example
deg(1 + x + x2 y + x3 y + 2x3 y 2 ) = 5 lc not defined
2 3 3 2
degx (1 + x + x y + x y + 2x y ) = 3 lcx (1 + x + x y + x3 y + 2x3 y 2 ) = y + 2y 2
2
4.7.4 Differentiation
Let
p = a0 + a1 x + · · · + an xn
be a polynomial in x with coefficients from some commutative ring R. Then we can define its
derivative in a purely formal manner as
p0 = a1 + 2a2 x + · · · + nan xn−1 .
(We sometimes use the traditional notation dp/dx instead of p0 .) Note that there is no reliance on
limits here. It is easy to see that all the algebraic properties of derivatives (i.e., ones that do not
depend on limits) still hold. Thus
(p + q)0 = p0 + q 0
(pq)0 = p0 q + pq 0 .
32
It follows from the definition that p0 = 0 if p ∈ R. However the converse is not necessarily true:
consider x2 + 1 ∈ Z2 [x]. Of course the converse is true when the coefficients come from the familiar
number systems Z, Q, R or C.
Let k be any field and k(x) consist of all ratios p/q where p, q ∈ k[x] and q 6= 0. If we define
addition and multiplication in the obvious way then k(x) is a field (see §4.7.8). We can extend the
definition of differentiation to k(x) by putting
p0 q − pq 0
(p/q)0 = ,
q2
where p, q ∈ k[x]. We can also define partial derivatives (e.g., for elements of k(x, y)) in the obvious
way.
Exercise 4.18 Let f be an element of Zp [x] where p is a prime number. Show that f 0 = 0 if and
only if f ∈ Zp [xp ] i.e., f has the form
f = a0 + a1 xp + a2 x2p + · · · + an xpn .
p = p1 p2 · · · ps .
where each pi cannot be expressed as a product of two polynomials of smaller degree. We should
like this factorization to be unique in the same sense that the factorization of numbers into primes
is unique. Consider however the factorizations
where the polynomials are from Z[x]. None of the factors is invertible in Z[x] but the number of
factors in the two factorizations is different. The source of the problem is obvious: in the first
factorization the polynomials 2x + 4 and 3x + 9 are not irreducible elements of the ID Z[x]. In
other words the process which we described did not carry out a complete factorization. (This kind
of difficulty does not arise when R is a field—can you see why?) Now given
f = a0 + a1 x + · · · + an xn
33
define the content of f , denoted by cont(f ), to be the gcd of its coefficients. If the content of f is c
then we may write
f = cg
and call g the primitive part of f , denoted by pp(f ). (A polynomial is called primitive if its content
is 1 or any invertible element.) In order to obtain a unique factorization for p we write each pi as
ci p0i where ci is the content of pi . We may then collect all the contents together into one element c
of R and put
p = cp01 p02 · · · p0s .
Suppose that we also have
p = dq1 q2 · · · qt ,
where each qi is primitive and cannot be written as the product of two polynomials of lower degree.
Then it is a remarkable fact that:
1. d = c for some invertible element of R,
2. s = t, and
3. there is a permutation π of 1, 2, 3, . . . , s such that p0i = i qπ(i) for 1 ≤ i ≤ n where each i is
an invertible element of R.
In other words R[x] is a UFD. (Why do we not bother factorizing c and d?)
What we have is that if R is a UFD then so is R[x]. Thus if R is a UFD then so is R[x, y]
because we can regard this as R[x][y] and let R[x] play the rôle of R in the preceding discussion.
It is now clear that the same can be said of R[x1 , . . . , xn ].
Naturally there is a huge difference between knowing that polynomials can be factorized and
actually producing a factorization. This is one important application area of Computer Algebra.
It is worth noting that the phrase irreducible polynomial is normally used in the following sense: p
is said to be irreducible if we cannot write it as the product of two other polynomials of smaller
degree. Thus 2x + 4 is an irreducible polynomial of Z[x] even though it is not an irreducible element
of this ring! As another example x2 + 1 is irreducible as a polynomial in R[x]. To see this suppose
otherwise, then we must have
x2 + 1 = (a1 x + a0 )(b1 x + b0 )
for some real numbers a0 , a1 , b0 , b1 . Note that a1 6= 0 otherwise the degree of the right hand side is
two low. But now this means that −a0 /a1 is a root of x2 + 1 and this is impossible since the square
of any real number is non-negative. This example brings out a very important fact: the concept
of irreducibility depends on the ring R of coefficients. If we regard
√ x2 + 1 as a polynomial in C[x]
then of course we can factorize it as (x + i)(x − i) where i = −1.
Exercise 4.19 Prove that x2 − 2 is irreducible in Z[x] but of course in R[x] it factorises.
We now turn our attention to the gcd and assume that f 6= 0 or g 6= 0 so any gcd is non-zero.
Recall from §4.2 that h is a gcd of two polynomials f , g if h divides both of the polynomials and
any other common divisor of f , g also divides h. Furthermore any two gcd’s are related by an
invertible element, which in the case of polynomial rings must be an invertible constant from the
ring of coefficients R (prove this). It is easy to see that no polynomial of degree strictly larger than
that of h can be a common divisor of f , g. If R is a field then all nonzero constants are invertible
34
and this means that we can define gcd’s of f , g to be all those common divisors of f , g of largest
possible degree. If R is not a field then this is not quite enough to capture the gcd (we might
be missing some constant non-invertible factor). It is standard practice to abuse notation and use
gcd(f, g) to stand for a greatest common divisor of f , g and even talk of the gcd. This does no
harm because we are not normally bothered about multiples by invertible elements. In many cases
it is possible to remove the ambiguity by insisting on some extra condition that must be satisfied
by a gcd. For example if the coefficients are rationals then we can insist that the leading coefficient
should be 1 (i.e., the gcd is monic).
Just as in the case of the integers there is a very close connection between the gcd and factor-
ization. It is left up to you to work this out if you are interested.
We will also make use of the least common multiple of two polynomials f , g. This is a polynomial
of least possible degree which is divisible by both f and g. It is unique up to constant factors and
is denoted by lcm(f, g). In fact we have lcm(f, g) = f g/ gcd(f, g).
r0 = q1 r1 + r2
r1 = q2 r2 + r3
r2 = q3 r3 + r4
..
.
rs−2 = qs−1 rs−1 + rs
rs−1 = qs rs + rs+1
where rs+1 = 0 and deg(ri ) < deg(ri−1 ) for 1 ≤ i ≤ s. Note that we must eventually have ri = 0
for some i since deg(r0 ) > deg(r1 ) > . . . > deg(ri ) > . . . ≥ 0.
Let us now see how to find q, r so that (5) holds. If f = 0 then we just take q = 0, r = 0.
Otherwise we could try q = 0, r = f . This is fine unless deg(f ) ≥ deg(g); however if this is so then
we can improve matters by (repeatedly) replacing q with q + (a/b)xm−n and r with r − (a/b)xm−n g
where a = lc(r), b = lc(g), m = deg(r) and n = deg(g). Each time this is done the degree of r
drops and so we must eventually have r = 0 or deg(r) < deg(g) as required. The correctness of the
method follows from the loop invariant qg + r = f (check this).
Note that we can calculate the quotient and remainder polynomials by hand in a manner similar
35
to the long division of integers. For example we can divide x3 − x2 + x − 1 by x2 − 2x + 1 as follows:
x + 1
x2 − 2x + 1 x3 − x2 + x − 1
x3 − 2x2 + x
x2 − 1
x2 − 2x + 1
2x − 2
The quotient is x + 1 and the remainder is 2x − 2. Here we were lucky in that no fractions were
needed, in general the calculations involve fractions even if the inputs have integer coefficients.
We now focus on polynomials with rational coefficients. Note that the Euclidean Algorithm relies
quite heavily on rational arithmetic and this can be quite costly due to the many gcd computations
(on coefficients) carried out. We could avoid this by clearing denominators from the inputs. We may
also use the fact that if f , g are non-zero polynomials with integer coefficients and deg(f ) ≥ deg(g)
then we can find polynomials q, r with integer coefficients such that
lc(g)deg(f )−deg(g)+1 f = qg + r
where r = 0 or deg(r) < deg(g). This leads to an obvious modification of Euclid’s algorithm.
Unfortunately this process can lead to very large integers even when the input consists of small
ones. For example consider
−15x4 + 3x2 − 9,
an xn + an−1 xn−1 + · · · + a0
could be replaced by
(an /d)xn + (an−1 /d)xn−1 + · · · + (a0 /d)
36
where
d = gcd(an , an−1 , . . . , a0 ).
But the whole point of avoiding rational arithmetic was because of its need to compute many gcd’s!
The excessive growth of the coefficients is clearly caused by the factor lc(g)deg(f )−deg(g)+1 which
enables us to stay with integer arithmetic. In fact this is larger than necessary and it is possible
to construct algorithms, known as sub-resultant polynomial remainder sequences, which avoid this
large growth. We do not go into details here since we will look at a different method later on which
is the one used in practice.
Finally the discussion in §4.6.1 on the Extended Euclidean Algorithm for the integers carries
over directly to the version for polynomials. We can therefore find polynomials u, v such that
uf + vg = d
where d = gcd(f, g). Moreover we can ensure that u = 0 or deg(u) < deg(g) and v = 0 or
deg(v) < deg(f ).
f Y + gZ = h
where f , g and h are polynomials in x and Y , Z are unknowns. Find a a necessary and sufficient
condition for such an equation to have solutions in X, Y that are themselves polynomials. Describe
a method for finding such solutions when they exist.
Exercise 4.21 A naı̈ve attempt at extending Euclid’s algorithm to multivariate polynomials does
not succeed; why?
for some q(x) ∈ k[x]. We can prove this directly by showing that
i
xi = ((x − α) + α) = (x − α)qi (x) + αn ,
for some qi (x) ∈ k[x]. This is fairly obvious and is easy to prove by induction on i. The full result
follows from the fact that p(x) is a linear combination of 1, x, x2 , . . .. The reason for looking at this
proof is that it readily generalizes to multivariate polynomials. Suppose that f (x1 , x2 , . . . , xn ) ∈
37
k[x1 , x2 , . . . , xn ] and α1 , α2 , . . . , αn ∈ k. We say that (α1 , α2 , . . . , αn ) is a root of f (x1 , x2 , . . . , xn )
if and only if f (α1 , α2 , . . . , αn ) = 0. Suppose that
where hi ∈ k[xi ]. Now it is clear that the final expression, when multiplied out can be collected so
as to have the form
y1 q1 + y2 q2 + · · · + yn qn + α1i1 α2i2 · · · αnin , (†)
where q1 , q2 , . . . , qn ∈ k[x1 , x2 , . . . , xn ]. Note that there may be more than one way of collecting
terms so there may be more than one choice for the qi . As an example consider x21 x2 with α1 = 1,
α2 = 2 so that y1 = x1 − 1 and y2 = x2 − 2. Then
or as
y1 (2y1 + 2y2 + 4) + y2 (y12 + 1) + 2
as well as many other ways, all of which have the form (†). Now using the fact that f is a linear
combination of power products of the xi we see that
f = y1 g1 + y2 g2 + · · · + yn gn + f (α1 , α2 , . . . , αn )
We will use this fact later on when we look at the solution of simultaneous polynomial equations.
Before leaving this section it is worth noting that it provides an excellent illustration of a process
that happens quite frequently in Mathematical subjects. We have a result for a special case (here
the remainder theorem for univariate polynomials). This result can be derived quite easily from
Euclid’s Algorithm. However the view implied by this proof does not generalize (Euclid’s Algorithm
does not hold for multivariate polynomials). However a little more work shows that we can find
another proof that gives us not only another view of the result but also generalizes. Indeed this
process of generalization will reach a very impressive level when we look at Gröbner bases since
these can be seen as a generalization of Euclid’s Algorithm to multivariate polynomials.
38
4.7.8 Rational Expressions
Let k be any field. Given the polynomial ring k[x1 , . . . , xn ] we can form the set
Two elements p/q and p0 /q 0 of k(x1 , . . . , xn ) are said to be equal if and only if pq 0 − p0 q = 0 as a
polynomial in k[x1 , . . . , xn ]. We define addition and multiplication by
(Can you prove that these definitions are unambiguous?) Given these operations k(x1 , . . . , xn )
becomes a field and is called the field of rational expressions (or rational functions or quolynomials)
in x1 , . . . , xn over k.
Just as in the case of polynomials it is common practise to regard an expression such as (x2 −
1)/(x − 1) as a function. But this can be rather confusing because here the expression x + 1 does
not denote the same function since the latter is defined at x = 1 while the former is not. (Of course
the two functions are both defined at all other points and have equal values.) Note however that
as elements of k(x) the two expressions are equal since (x2 − 1) = (x + 1)(x − 1) as polynomials.
Computer Algebra systems work with such expressions in the algebraic rather than the functional
sense. Despite this it is common mathematical practice to call such objects rational functions rather
than rational expressions as we have done. See also [21], pp. 72–74.
Recursive Representation
This is just an expression of the isomorphism
R[x1 , . . . , xn ] ∼
= R[x1 , . . . , xn−1 ][xn ]
f = 3xy 2 + 2y 2 − 4x2 y + y − 1
may be represented as
(3x + 2)y 2 + (−4x2 + 1)y + (−1)y 0
ci xin where the ci are polynomials which
P
where y is the main indeterminate. In general we use
are themselves represented in a similar format. The Axiom type is UP(x,UP(y,INT)) (here and
elsewhere INT can be replaced by any integral domain). Note that Axiom prints back the polynomial
in expanded form but it is held as indicated as a use of coefficients f shows.
39
Distributive Representation
We consider the power products in the given indeterminates e.g., x21 x3 x75 . We pick a total order
on the power products such that 1 (which stands for x01 · · · x0n ) is least and for each power product
there are only finitely many less than it. We may now write a polynomial p(x0 , . . . , xn ) as
X
p(x0 , . . . , xn ) = ct t
t≤t̄
where ct ∈ R for each t. As an example of a suitable order we could first sort according to degree
and within each degree use the lexicographic order: we order the indeterminates, e.g.,
x1 >L x2 >L · · · >L xn
and then
xi11 · · · xinn >L xj11 · · · xjnn
if and only if there is a k such that il = jl for 1 ≤ l < k and ik > jk . This is called the graded
lexicographic order. By contrast if we order power products purely lexicographically then we do not
have a suitable order because there can be infinitely many power products less than a given one
(see the example on p. 69). With f as above, the Axiom type is DMP([x,y],INT) (an alternative
is HDMP([x,y],INT) which imposes a different order on power products.
Dense Representations
In a dense representation we record all the coefficients up to the highest degree of the main inde-
terminate or the highest power product. Thus for a recursive representation we might have
m
X
ci xi ←→ (c0 , . . . , cm )
i=0
where (. . .) denotes a list or array. The problem is that this representation can lead to a great deal
of wasted space, e.g., consider x1000 + 1 or x4 y 7 + x + 1.
Exercise 4.22 How many power products of degree d are there in the indeterminates x, y? Gen-
eralize to power products in n indeterminates.
Sparse Representations
For these we drop all zero coefficients. This means that with each non-zero coefficient we must
record the corresponding degree or power product. For example
x1000 + 1 ←→ ((1, 1000), (1, 0)),
x4 y 7 + 2x + 1 ←→ ((1, (4, 7)), (2, (1, 0)), (1, (0, 0))).
In the second example a power product xe11 · · · xenn is represented by (e1 , . . . , en ). In general Axiom
uses a sparse representation.
40
Rational Expressions
The obvious representation for these is as a pair of polynomials consisting of the numerator and
denominator. If the numerator is in normal form then we also have a normal form for rational
expressions (since f /g = 0 iff f = 0, of course we must have g 6= 0 for f /g to be a rational
expression at all). It is tempting, by analogy with the rational numbers, to remove gcd(f, g)
from the numerator and denominator of f /g. However this can be very unwise: for example
(1 − xn )/(1 − x) is very compact and requires little storage even for large n. Removing the gcd
gives us 1 + x + · · · + xn−1 which is expensive in storage space and clutters up any output in a
nasty way! The situation becomes even worse when multivariate polynomials are involved. We also
have to bear in mind that polynomial gcd’s are fairly costly to compute (compared to integer ones).
However the default behaviour of Axiom is to take out the gcd, by contrast Maple does not.
Exercise 4.23 Consider rational expressions in one indeterminate and rational coefficients. Even
if we remove the gcd from the numerator and denominator we still do not have a canonical form.
Give a method that leads to a canonical form which uses only integer coefficients. (Here we assume
that integers are held in a canonical form, which is always the case.)
using Euclid’s algorithm over the integers leads to rather large integers, even though the input
and output are small (we are justified in claiming that the output is small because any non-zero
number, e.g., 1, is a valid answer, remember that we are working with coefficients over Q). This
phenomenon of intermediate expression swell is such a major problem in computer algebra that it
is worth seeing another example of it.
Some systems (e.g., Maple) use a sum of products representation for polynomials (and other
expressions). In this representation we take a sum over products of polynomials rather than just
a sum over monomials. This representation is compact but it does mean that we cannot tell just
by looking at it whether or not a given representation yields the zero polynomial. In order to
achieve this it appears that we must expand terms—at least nobody knows of any way round this.
Unfortunately this process can lead to an exponential increase in the number of terms produced. The
following is an example of this. The Vandermonde determinant of n indeterminates x1 , x2 , . . . , xn
is defined by:
1 1 ... 1
x1 x2 ... xn
V (x1 , x2 , . . . , xn ) = x21 x22 ... x2n .
.. .. ..
. . .
xn−1
1 xn−1
2 ... xn−1
n
It can be shown that Y
V (x1 , x2 , . . . , xn ) = (xj − xi ).
1≤i<j≤n
41
(Thus V (x1 , x2 , . . . , xn ) = 0 if and only if xi = xj for some i 6= j.) Now put
1 1 ... 1
1 1 ... 1
x1 x2 ... xn+1
Z(x1 , x2 , . . . , xn+1 ) = x21 x22 ... x2n+1 .
.. .. ..
. . .
xn−1
1 xn−1
2 ... xn−1
n+1
This is 0 since the first two rows are equal. However expanding along the first row we have
n+1
X
Z(x1 , x2 , . . . , xn+1 ) = (−1)i+1 V (x1 , . . . , x̂i , . . . , xn+1 )
i=1
n+1
X Y
= (−1)i+1 (xk − xj ),
i=1 1≤j<k≤n+1
j,k6=i
where x1 , . . . , x̂i , . . . , xn+1 denotes the sequence x1 , x2 , . . . , xn+1 with xi deleted. Now the last
expression is a perfectly good sum of products representation but any attempt at expanding it
leads to n! terms for each summand and cancellation takes place only between summands!
We therefore have a difficult choice: systems that expand automatically will face a huge penalty
at some point. Systems that do not expand automatically leave open the possibility of undetected
divisions by 0 and the user must look out for such difficulties. The next exercise shows a probabilistic
approach to deciding equality to 0 without expanding.
Exercise 4.24 Let p(x1 , x2 , . . . , xn ) be a non-zero polynomial with coefficients from Q and put
It is a fact that if we pick the ai randomly (from some range) then the probability that we obtain a
member of V(p) (i.e., a root of p) is 0. (Think of algebraic curves in 2 dimensions and surfaces in
3 dimensions.) Can you use this to develop a fast randomized test for a polynomial expression being
zero? To be more precise if the test tells us that the polynomial is non-zero then the answer must
be correct. If on the other hand the test tells us that the polynomial is zero then the answer must
be correct with high probability. (See J. T. Schwartz, Fast Probabilistic Algorithms for Verification
of Polynomial Identities, J. ACM, 27, 4, (1980) 701–717.)
42