Algorithms
Algorithms
Square Roots
A detailed description of the algorithm used in Hewlett-
Packard hand-held calculators to compute square roots.
by William E. Egbert
BEGINNING WITH THE HP-35,1'2 all HP personal the hundreds digit is found, it is squared and sub
calculators have used essentially the same al tracted fromx, and the tens digit is found. This pro
gorithms for computing complex mathematical func cess, however, is not exactly straightforward, so some
tions in their BCD (binary-coded decimal) micro algebra is in order.
processors. While improvements have been made in The following definitions will be used:
newer calculators,3 the changes have affected primarily x = the number whose square root is desired
special cases and not the fundamental algorithms. a = most significant digit(s) of Vx previously
This article is the first of a series that examines computed
these algorithms and their implementation. Each b = the next digit of Vx to be found
article will present in detail the methods used to j = the power of 10 associated with b
implement a common mathematical function. For RQ = x-a2, the current remainder
simplicity, rigorous proofs will not be given, and dj = the new a when digit b is added in its
special cases other than those of particular interest proper place. a¡ = a+(bxlOj) (1)
will be omitted. Rb = the portion of remainder R „ that would be
Although tailored for efficiency within the environ removed by adding b to a. Rfa = a2-a2 (2)
ment of a special-purpose BCD microprocessor, the For example, let x = 54756. Then Vx = 234.
basic mathematical equations and the techniques Let a = 200.
used to transform and implement them are applicable b = the digit we are seeking (3, in this case)
to a wide range of computing problems and devices. j = 1 (the 10's digit is being computed)
Ra = 54756 -(200)2 = 14756.
The Square Root Algorithm Note that a¡ and Rb will vary with the choice of b.
This article will discuss the algorithm and methods The process of finding Vx one decade at a time
used to implement the square root function. approaches the value of Vx from below. That is, at
The core of the square root algorithm is a simple any point in the computation, a =sVx. Consequently,
approximation technique tailored to be efficient Ra^0.
using the instruction set of a B CD processor. The tech With this in mind it is easy to see that for any
nique is as follows: decade j, the value of b is the largest possible digit so
V~x is desired that
1. Guess an answer a Ea-Rb^0
2. Generate a2 or
3. FindR=x-a2 R b ^ R o - ( 3 )
4. If the magnitude of R is sufficiently small, a = Vx. Using equations 1 and 2 we have
5. If R is a positive number, a is too small.
If R is a negative number, a is too big. Rb = [a-HbxlO')]2 -a2.
6. Depending on the result of step 5, modify a and Expanding and simplifying,
return to step 2. Rb = 2abxlOi + (bxlOi)2. (4)
The magnitude of R will progressively decrease until Inserting (4) into (3) yields the following rule for
the desired accuracy is reached. finding digit b.
This procedure is only a rough outline of the actual Digit b is the largest possible digit so that
square root routine used. The first refinement is 2abxlOi+(bxlO')2 =£ Ra (5)
to avoid having to find a2 and x -a2 each time a is When the digit that satisfies equation 5 is found, a
changed. This is done by finding a one decade at a new a is formed by adding bxlO' to the old a, the
time. In other words, find the hundreds digit of a, decade counter (j) is decremented by 1, and a new
then the tens digit, the units digit, and so on. Once Ra is created; the new Ra is the old Ra minus Rb.
22
23
Bulk Rate
Hewlett-Packard Company, 1501 Page Mill U.S. Postage
Road, Palo Alto, California 94304 Paid
Hewlett-Packard
Company
HEWLETT-PACKARDJOURNAL
MAY 1977 Volume 28 • Number 9
Vector Rotation
An angle can be expressed as a vector having X
and Y components and a resultant R (see Fig. 1).
If R is the unit vector, then X=cos 9 and Y=sin 0.
However, regardless of the length of R, Y/X = tan 0and
X/Y=cot0. This holds true for all values of 6
from 0 to 277. Thus, if some way could be found
to generate X and Y for a given 9, all the trigonometric
functions could be found.
In vector geometry a useful formula results when Fig. 2.
one rotates a vector through a given angle. Let
us suppose we have a vector whose angle is 6\, by tan 02 and added or subtracted as needed. If 02 is
and we know its components Xa and Yj (see Fig. 2). chosen so that tan 02 is a simple power of 10
The X2 and Y2 that result when the vector is rotated (i.e., 1, 0.1, 0.01,...) then the multiplications simply
an additional angle 02 are given by: amount to shifting X: and Yj. Thus to generate X2'
and Y2', only a shift and an add or subtract are needed.
X2 = Xa cos 62 - Ya sin 02
Y2 = Y! cos 02 + Xa sin 02 Pseudo-Division
The tangent of 0 is found as follows. First 0 is
Dividing both sides of these equations by cos 02 divided into a sum of smaller angles whose tangents
gives: are powers of 10. The angles are tan"1 (1) = 45°,
tan"1 (0.1) = 5.7°, tan'1 (0.01) « 0.57°, tan'1 (0.001)
X, =0.057°, tan'1 (0.0001)=0.0057°, and so on.
= X1- Yjtan 82 = X2' This process is called pseudo-division. First, 45° is
cos 02 subtracted from 0 until overdraft, keeping track of
(1) the number of subtractions. The remainder is restored
by adding 45°. Then 5.7° is repeatedly subtracted, again
= Y! +X1tan02 = Y2' keeping track of the number of subtractions. This pro
COS 02
cess is repeated with smaller and smaller angles.
Note that X2' and Y2', while not the true values of X2 Thus:
and Y2, both differ by the same factor, cos 02. Thus
Y2'/X2' = Y2/X2. From Fig. 2 it is plain that the quo 0 = q0 tan l (1) + qa tan"1 (0.1) +q2 tan"1 (0.01)... +r
tient Y2'/X2' is equal to tan (0a+ 02). Thus the tan
gent of a large angle can be found by manipulating The coefficients q¡ refer to the number of subtractions
smaJJer angles whose sum equals the large one. Re possible in each decade. Each q¡ is equal to or less
turning to equation 1 above, it can be seen that to than 10, so it can be stored in a single four-bit digit.
generate X2' and Y2'f Xa and Yj need to be multiplied This process of pseudo-division is one reason that
all the trigonometric functions are done in radians.
For accuracy, tan'^lO"') needs to be expressed to
ten digits. In degrees, these constants are random digits
and require considerable ROM (read-only memory)
space to store. However, in radians, they become, for
the most part, nines followed by sixes. Because of this,
they can be generated arithmetically, thus using
fewer ROM states. Also, in radians, tan"1 (1) = 77/4,
which is needed anyway to generate TT. The problem
with using radians is that since 77 is an irrational num
ber, scaling errors occur as discussed earlier. This
means cardinal points do not give exact answers. For
example, sin (720°) ¿ O when calculated this way
but rather 4xlO~9. See reference 3 for a discussion
Fig. 1. of this point.
18
19
Bulk Rate
Hewlett-Packard Company, 1501 Page Mill U.S. Postage
Road, Palo Alto, California 94304 Paid
Hewlett-Packard
Company
HEWLETT-PACKARDJOURNAL
JUNE 1977 Volume 28 . Number 10
/^ I j delete N mailing /"> _ peels [~~ A I I |~) p O O • To Cnan9e y°ur address or delete your name from our mailing list please send us your old address label (it peels off)
V-/I I/"\IN Allow days \_/l f\\-J LJ Pi C.OO.Send changes to Hewlett-Packard Journal, 1501 Page Mill Road, Palo Alto, California 94304 U.S.A. Allow 60 days
by William E. Egbert
BEGINNING WITH THE HP-35,1'2 all HP personal assumed to be positive and the sign of the input argu
calculators have used essentially the same al ment becomes the sign of the answer. All angles are
gorithms for computing complex mathematical func calculated in radians and converted to degrees or
tions in their BCD (binary-coded decimal) micro grads if necessary.
processors. While improvements have been made in
newer calculators,3 the changes have affected primarily General Algorithm
special cases and not the fundamental algorithms. A vector rotation process similar to that used in the
This article is the third of a series that examines trigonometric routine is used in the inverse process
these algorithms and their implementation. Each as well. A vector expressed in its X and Y components
article presents in detail the methods used to imple can easily be rotated through certain specific angles
ment a common mathematical function. For sim using nothing more than shifts and adds of simple
plicity, rigorous proofs are not given, and special integers. In the algorithm for tan"1 A|, the input
cases other than those of particular interest are argument is | A | , or | tan 6 , where 6 is the unknown.
omitted. Letting tan 6 = Ya/X1; |A| can be expressed as |A /I,
Although tailored for efficiency within the environ where Y: = A| and Xl = 1. A vector rotation pro
ment of a special-purpose BCD microprocessor, the cess (see Fig. 1) is then used to rotate the vector clock
basic mathematical equations and the techniques wise through a series of successively smaller angles
used to transform and implement them are applicable 0¡, counting the number of rotations for each angle,
to a wide range of computing problems and devices. until the Y2 component approaches zero. If q¡
denotes the number of rotations for 8i then
Inverse Trigonometric Functions
This article will discuss the method of generating \6\ = q0 + ql6l +...+ qiflj +...
sin"1, cos"1, and tan"1. An understanding of the
trigonometric function algorithm is assumed. This This process is described in detail below.
was covered in the second article of this series and
the detailed discussion will not be repeated here.4 Vector Rotation
To minimize program length, the function tan-1A To initialize the algorithm, A and 1 are stored in
is always computed, regardless of the inverse trig- fixed-point format in registers corresponding to Y-,
onometric function required. If sin-1A is desired,
A/V 1-A2 is computed first, since
22
23
by William E. Egbert
29
ln(M) = ln(r) - ln(PJ P_! = 1, and Kj is the largest integer such that P¡ <1.
In practice, each A-P¡ is formed by multiplying
Finally A-Pj.j by (1 + 10~'), KJ times. There is one inter
mediate product, T¡, for each count of K¡, as shown
ln(M) = ln(r) - (K0ln(a0j + K^nfaJ + ... + Kjln(aj) below.
+ ... + Knln(an))
TO = A(l 10"
Thus to find ln(M) one simply multiplies M by the
carefully selected numbers a¡ so that the product
! = A(l + 10"
MPn is forced to approach 1. If all the logarithms of
a¡ are added up along the way to form ln(Pn) then
ln(M) is the logarithm of the remainder r minus this TKo =A(1
sum. Notice that the remainder r is nothing more than
the final product MPn.
Implementation KO+I 10 Â
-°K
)°1(
How is this algorithm implemented in a special-
purpose microprocessor? First of all, the terms of Pn
Tm =
were chosen to reduce computation time and mini
mize the amount of ROM (read-only memory) needed ...(1 = APn
to store a¡ and its logarithm. The numbers chosen for
the a¡ terms are of the form a¡ = (1 + 10"'), where
m= + K
j = 0-4 (see Table 1).
30
B ; + l = 8 ^ ( 1 + 1 0
B = 10~')4-10
(5)
0.9996 = A-P4 = r 1.8638 = £ ln(aj) This expression is now in a very useful form, since
the a¡ term is the same as before, but the zero test is
performed automatically when the 10 ' subtraction
*Another x2 would result in AP3 >1. Thus a¡ is is done. A test for a borrow is all that is required. An
changed to 1.1.
additional benefit of this transformation is that accu
**The 1.01 constant is skipped entirely.
racy can be increased by shifting -B¡ left one digit for
each 3j term after it has been applied the maximum
Applying the values found in Table 2 to equation 3 number of times possible. This increases accuracy by
results in replacing zeros generated as B¡ approaches zero with
significant digits that otherwise would have been lost
ln(0.155) = (0.9996 - 1) -1.8638 out of the right end of the register. This shifting,
= -1.8642 which is equivalent to a multiplication by 10', gives
yet another benefit. Multiplying equation 5 by 10' and
This answer approximates very closely the correct simplifying,
10-digit answer of -1.864330162.
This example demonstrates the simplicity of this -B¡xlO' = (-Bj.J >)-10~>)xlO'
method of logarithm generation. All that is required
is a multiplication (shift and add) and a test for 1.
-BjXlO' = -Bj.j -1 forsomej (6)
To implement this process using only three working
registers, a pseudo-quotient similar to the one gen
erated in the trigonometric algorithm is formed.5 Each Notice that the 10"' subtraction reduces to a simple
digit represents the number of successful multipli -1 regardless of the value of j. The formation of the
cations by a particular a¡. For the preceding example, initial -B0 is also easy since -B0 = -(A - 1) = 1 - A.
the pseudo-quotient would be This is formed by taking the 10's complement of M
(the original mantissa), creating 10 -- M. A right
shift divides this by 10 to give 1 - M/10 = 1 - A =
-B0. A final, almost incredible, benefit of the B¡
2 5 0 1 1
transformation is that the final remainder -Bm x
t t t t T 10' is in the exact form required to be the first term of
j = O j = 1 j = 2 j = 3 j = 4 the summation process of equation 4 without further
modification. The correct ln(a¡) constants are added
With -ln(r) = (r - 1) as the first term, the appro directly to -Bm x 10', shifting the sum right one
priate logarithms of (a¡) are then summed according digit after each pseudo-quotient digit to preserve
to the count in the pseudo-quotient digit correspond accuracy and restore the proper normalized form dis
ing to the proper a¡. The final sum is -ln(A). rupted by equation 6. The result is -ln(A).
At this point one more transformation is needed to Finally, the required ln(M) is easily found by sub
optimize this algorithm perfectly to the micropro tracting the computed result -ln(A) from ln(10).
31
Bulk Rate
Hewlett-Packard Company, 1501 Page Mill U.S. Postage
Road, Palo Alto, California 94304 Paid
Hewlett-Packard
Company
HEWLETT-PACKARDJOURNAL
APRIL 1978 Volume 29 . Number 8
To change peels address or delete your name from our mailing list please send us your old address label (it peels off).
CHANG Send changes to Hewlett-Packard Journal, 1501 Page Mill Road, Palo Alto, California 94304 U.S.A. Allow 60 days.