CVE 154 - Statics of Rigid Bodies: Lesson 2: Errors in Numerical Computation
CVE 154 - Statics of Rigid Bodies: Lesson 2: Errors in Numerical Computation
LESSON 2:
ERRORS IN
NUMERICAL
COMPUTATION
SIGNIFICANT FIGURES
The concept of a significant figure, or digit, has been developed to formally
designate the reliability of a numerical value. The significant digits of a number
are those that can be used with confidence. They correspond to the number of
certain digits plus one estimated digit.
For example, the speedometer shown
has two certain digits, 48. It is
conventional to set the estimated digit
at one-half of the smallest scale
division on the measurement device.
Thus the speedometer reading would
consist of the three significant figures:
48.5. In a similar fashion, the odometer
has 6 certain digits, 87324.5. Note that
the white dial on the odometer
corresponds to the tenths decimal
place. Including an estimated digit
would yield to a seven-significant-
ODOMETER
figure reading of 87,324.45.
SIGNIFICANT FIGURES
The following are rules for identifying significant figures when writing or
interpreting numbers.
Zeros to the left of the significant figures (leading zeros) are not significant.
Zeros to the right of the non-zero digits (trailing zeros) are significant if they are
to the right of the decimal point as these are only necessary to indicate
precision. However trailing zeros placed on the left of decimal point may not be
significant, depending on the precision of the measurement.
SIGNIFICANT FIGURES
EXAMPLE No. of Significant Figures
12345.6789 9
1020.304 7
0.0001234 4
1234.0000 8
3, 4, 5 or 6
123000 Depends if the trailing zeros are known with confidence
based on the precision of the measurement.
To resolve uncertainties due to the last rule, it is better to indicate values in their
scientific notation. For example, 123000 can be written as:
Scientific notation No. of Significant Figures
1.23 × 105 3
1.230 × 105 4
1.2300 × 105 5
1.23000 × 105 6
SIGNIFICANT FIGURES
The concept of significant figures has two important implications for our study of
numerical methods:
1. We can use the number of significant figures as a criteria to specify how
confident we are in the approximate result of our numerical calculations. For
example, we might decide that our approximation is acceptable if it is correct
to four significant figures.
2. Although quantities such as π, e, or 7 represent specific quantities, they
cannot be expressed exactly by a limited number of digits. For example,
π = 3.141592653589793238462643...
Because computers retain only a finite number of significant figures, such
numbers can never be represented exactly. The omission of the remaining
significant figures is called round-off error.
ACCURACY AND PRECISION
DEFINITION:
Accuracy – refers to how closely a computed or measured value agrees with the
true value
Precision – refers to how closely individual computed or measured values agree
with each other
Inaccuracy – (also called bias) is defined as systematic deviation from the truth
Imprecision – (also called uncertainty) refers to the magnitude of the scatter of
values
2. True fractional relative error – this representation normalizes the error to the
true value. This takes into account the order of magnitude of the value under
examination. For example a 1 centimeter error is much more significant if you
are measuring a 100 centimeter length object rather than a 10000 centimeter
length object.
true error
True fractional relative error =
true value
3. True percent relative error (εt) – this representation expresses the true
fractional relative error in percent.
true error 𝐸t
εt = ×100% = ×100%
true value true value
ERROR REPRESENTATION
4. Approximate percent relative error (εa) – this representation, in percent,
normalizes the error to an approximate value as denoted by the subscript a.
In real-world applications, we will obviously not know the true value
beforehand. For these situations, an alternative is to normalize the error using
the best available estimate of the true value, that is, to the approximation
itself.
approximate error
εa = ×100%
approximation
Most numerical solutions follow an iterative approach to compute answers. In
such an approach, a present approximation is made on the basis of a previous
approximation. This process is performed repeatedly, or iteratively, to
successively compute better approximations. Hence, the approximate error is
often estimated as the difference between previous and current
approximations. Thus, the equation becomes:
current approximation − previous approximation
εa = ×100%
current approximation
ERROR REPRESENTATION
Based on the equations, errors can either be positive or negative:
A positive error occurs, if the approximation is less than the true value (or the
previous approximation is less than the current approximation.
A negative error occurs, if the approximation is greater than the true value (or
the previous approximation is greater than the current approximation. A
negative error also occurs when computing relative errors, if the denominator
is less than zero (negative value).
This means that numerical round-off errors are directly related to the manner
in which numbers are stored in a computer.
word – it is the fundamental unit whereby information is stored in a computer.
– It consists of a string of binary digits, or bits.
– Numbers are typically stored in one or more words.
NUMBER SYSTEMS
BASE-10 OR DECIMAL NUMBER SYSTEM:
The base-10 system uses the 10 digits—0, 1, 2, 3, 4, 5, 6, 7, 8, 9— to represent
numbers. By themselves, these digits are satisfactory for counting from 0 to 9.
For numbers greater than 9, combinations of these basic digits are used, with
the position or place value specifying the magnitude. The figure below shows
how a number is formulated in the base-10 system.
NUMBER SYSTEMS
BASE-2 OR BINARY NUMBER SYSTEM:
This is the number system that is used by computers. This system only uses 2
digits—0 and 1— to represent numbers. This relates to the fact that the
primary logic units of digital computers are on/off electronic components. The
figure below shows how a number is formulated in the base-2 system. Here,
the number 10101101 in the base-2 system is equivalent to the number 173 in
the base-10 system.
INTEGER REPRESENTATION
SIGNED MAGNITUDE METHOD:
This employs the first bit of a word to indicate the sign, with a 0 for positive and
a 1 for negative. The remaining bits are used to store the number. The figure
below shows the representation of the decimal integer −173 on a 16-bit
computer using this method.
This method limits the capacity of computers to represent integers, since the
range of integers that will be represented is dependent on how many bits the
computer uses.
INTEGER REPRESENTATION
If you are using a 16-bit computer, the first bit holds the sign while the
remaining 15 bits can hold binary numbers from 000000000000000 to
111111111111111. The upper limit can be converted to a decimal integer, as in
1 × 214 + 1 × 213 + ⋯ + 1 × 21 + 1 × 20 = 32,767
Thus, a 16-bit computer word can store decimal integers ranging from −32,767
to 32,767. In addition, because zero is already defined as 0000000000000000,
it is redundant to use the number 1000000000000000 to define a “minus zero.”
Therefore, it is usually employed to represent an additional negative number:
−32,768. Therefore,
Note that the mantissa is usually normalized if it has leading zero digits. For
example, suppose the quantity 1/34 = 0.029411765...was stored in a floating-
point base-10 system that allowed only four decimal places to be stored. Thus,
1/34 would be stored as
0.0294 × 103
However, in the process of doing this, the inclusion of the useless zero to the
right of the decimal forces us to drop the digit 1 in the fifth decimal place.
FRACTIONAL REPRESENTATION
The number can be normalized to remove the leading zero by multiplying the
mantissa by 10 and lowering the exponent by 1 to give
0.2941 × 10−1
Thus, we retain an additional significant figure when the number is stored.
The consequence of normalization is that the absolute value of the mantissa m
is limited. That is,
1
≤𝑚<1
𝑏
where b = the base. For example, for a base-10 system, m would range between
0.1 and 1, and for a base-2 system, between 0.5 and 1.
Floating-point representation allows both fractions and very large numbers to
be expressed on the computer. However, floating-point numbers take up more
room and take longer to process than integer numbers. More significantly,
however, their use introduces a source of error because the mantissa holds only
a finite number of significant figures. Thus, a round-off error is introduced.
FRACTIONAL REPRESENTATION
The following are aspects of the floating-point representation that have
significance regarding computer round-off errors:
1. There Is a Limited Range of Quantities That May Be Represented. Just like the
integer case, there are large positive and negative numbers that cannot be
represented. However, in addition to large quantities, the floating-point
representation has the added limitation that very small numbers cannot be
represented.
2. There Are Only a Finite Number of Quantities That Can Be Represented within
the Range. Thus, the degree of precision is limited. Irrational numbers like
𝜋 𝑎𝑛𝑑 𝑒 cannot be exactly represented. Furthermore, rational numbers that
do not exactly match one of the values in the set also cannot be represented
precisely. The errors introduced by approximating both these cases are
referred to as quantizing errors. For example, if π = 3.14159265358 . . . is to be
stored in a base-10 number system with 7 significant figures.
a. By Chopping. It becomes π=3.141592 with Et = 0.00000065...
b. By Rounding. It becomes π=3.141593 with Et = − 0.00000035...
FRACTIONAL REPRESENTATION
3. The Interval between Numbers, Δx, Increases as the Numbers Grow in
Magnitude. Here, quantizing errors will be proportional to the magnitude of
the number being represented.
All these are demonstrated in the Sample problem in the next slide.
SAMPLE PROBLEM
EXAMPLE 3: The initial 0 indicates that the quantity is
Create a hypothetical floating-point positive. The 1 in the second place
number set for a machine that stores designates that the exponent has a negative
information using 7-bit words. Employ the sign. The 1’s in the third and fourth places
first bit for the sign of the number, the give a maximum value to the exponent of
next three for the sign and the magnitude 1 × 21 + 1 × 20 = 3
of the exponent, and the last three for the Therefore, the exponent will be −3. Finally,
magnitude of the mantissa. Show all the the mantissa is specified by the 100 in the
positive values and their corresponding last three places, which conforms to
base-10 numbers.
1 × 2−1 + 0 × 2−2 + 0 × 2−3 = 0.5
SOLUTION: Although a smaller mantissa is possible (e.g.,
The smallest possible positive number is 000, 001, 010, 011), the value of 100 is used
shown. because of the limit imposed by
normalization. Thus, the smallest possible
positive number for this system is
+ 0.5 × 2−3
which is equal to 0.0625 in the base-10
system.
SAMPLE PROBLEM
The next highest numbers are developed by increasing the mantissa, as in
0111101 = (1 × 2−1 + 0 × 2−2 + 1 × 2−3) × 2−3 = 0.078125 in base-10
0111110 = (1 × 2−1 + 1 × 2−2 + 0 × 2−3) × 2−3 = 0.093750 in base-10
0111111 = (1 × 2−1 + 1 × 2−2 + 1 × 2−3) × 2−3 = 0.109375 in base-10
Notice that the base-10 equivalents are spaced evenly with an interval of 0.015625. At this
point, to continue increasing, we must decrease the exponent to 10, which gives a value of
1 × 21 + 0 × 20 = 2
The mantissa is decreased back to its smallest value of 100. Therefore, the next number is
0110100 = (1×2−1 +0×2−2 +0×2−3)×2−2 = (0.125000)10
This still represents a gap of 0.125000−0.109375 =0.015625. However, now when higher
numbers are generated by increasing the mantissa, the gap is lengthened to 0.03125,
0110101 = (1 × 2−1 + 0 × 2−2 + 1 × 2−3) × 2−2 = 0.156250 in base-10
0110110 = (1 × 2−1 + 1 × 2−2 + 0 × 2−3) × 2−2 = 0.187500 in base-10
0110111 = (1 × 2−1 + 1 × 2−2 + 1 × 2−3) × 2−2 = 0.218750 in base-10
This pattern is repeated as each larger quantity is formulated until a maximum number is
reached,
0011111 = ( 1 × 2−1 + 1 × 2−2 + 1 × 2−3) × 23 = 7.000000 in base-10
Exponent
Number
Sign of
Sign of
21 20 2-1 2-2 2-3 Number in Base-10 Interval
0 1 1 1 1 0 0 0.062500
0 1 1 1 1 0 1 0.078125 0.015625
0 1 1 1 1 1 0 0.093750 0.015625
0 1 1 1 1 1 1 0.109375 0.015625
0 1 1 0 1 0 0 0.125000 0.015625
0 1 1 0 1 0 1 0.156250 0.031250
0 1 1 0 1 1 0 0.187500 0.031250
0 1 1 0 1 1 1 0.218750 0.031250
0 1 0 1 1 0 0 0.250000 0.031250
0 1 0 1 1 0 1 0.312500 0.062500
0 1 0 1 1 1 0 0.375000 0.062500
0 1 0 1 1 1 1 0.437500 0.062500
0 1 0 0 1 0 0 0.500000 0.062500
0 1 0 0 1 0 1 0.625000 0.125000
0 1 0 0 1 1 0 0.750000 0.125000
0 1 0 0 1 1 1 0.875000 0.125000
0 0 0 1 1 0 0 1.000000 0.125000
0 0 0 1 1 0 1 1.250000 0.250000
0 0 0 1 1 1 0 1.500000 0.250000
0 0 0 1 1 1 1 1.750000 0.250000
0 0 1 0 1 0 0 2.000000 0.250000
0 0 1 0 1 0 1 2.500000 0.500000
0 0 1 0 1 1 0 3.000000 0.500000
0 0 1 0 1 1 1 3.500000 0.500000
0 0 1 1 1 0 0 4.000000 0.500000
0 0 1 1 1 0 1 5.000000 1.000000
0 0 1 1 1 1 0 6.000000 1.000000
0 0 1 1 1 1 1 7.000000 1.000000
EXTENDED PRECISION
It should be noted at this point that, although round-off errors are present, the
number of significant digits carried on most computers allows most engineering
computations to be performed with more than acceptable precision.
Commercial computers use much larger words and, consequently, allow
numbers to be expressed with more than adequate precision. For example,
computers that use IEEE format allow 24 bits to be used for the mantissa in
single precision, which translates into about seven significant base-10 digits of
precision with a range of about 10−38 to 1039.
With this acknowledged, there are still cases where round-off error becomes
critical. For this reason most computers allow the specification of extended
precision. The most common of these is double precision, in which the number
of words used to store floating-point numbers is doubled. It provides about 15
to 16 decimal digits of precision and a range of approximately 10−308 to 10308. In
many cases, the use of double-precision quantities can greatly mitigate the
effect of round-off errors. However, a price is paid for such remedies in that
they also require more memory and execution time.
ARITHMETIC COMPUTATIONS
We will use normalized base-10 numbers to demonstrate computations.
ADDITION:
When two floating-point numbers are added, the mantissa of the number with
the smaller exponent is modified so that the exponents are the same. This has
the effect of aligning the decimal points.
For example, suppose we want to add 0.1557·101 + 0.4381·10−1. The decimal of
the mantissa of the second number is shifted to the left a number of places
equal to the difference of the exponents [1 −(−1) =2], as in 0.4381·10−1 is turned
into 0.004381·101. Now the numbers can be added,
0.1557 · 101
+ 0.004381 · 101
0.160081 · 101
and the result chopped to 0.1600 ·101. Notice how the last two digits of the
second number that were shifted to the right have essentially been lost from
the computation.
ARITHMETIC COMPUTATIONS
SUBTRACTION:
Subtraction is performed identically to addition except that the sign of the
subtrahend is reversed. For example,
0.7642 · 103
− 0.7641 · 103
0.0001 · 103
For this case the result is not normalized, and so we must shift the decimal
three places to the right to give 0.1000 · 100 = 0.1000. Notice that the zero
added to the end of the mantissa is not significant but is merely appended to
fill the empty space created by the shift.
This introduces a substantial computational error because subsequent
manipulations would act as if these zeros were significant.
ARITHMETIC COMPUTATIONS
MULTIPLICATION AND DIVISION:
For multiplication, the exponents are added and the mantissas are multiplied.
Because multiplication of two n-digit mantissas will yield a 2n-digit result, most
computers hold intermediate results in a double-length register. For example,
0.1363 · 103 × 0.6423 · 10−1 = 0.08754549 · 102
If, as in this case, a leading zero is introduced, the result is normalized,
0.08754549 · 102 → 0.8754549 · 101 and chopped to give
0.8754 · 101
Division is performed in a similar manner, but the mantissas are divided and the
exponents are subtracted. Then the results are normalized and chopped.
ARITHMETIC COMPUTATIONS
LARGE COMPUTATIONS:
Certain methods require extremely large numbers of arithmetic manipulations
to arrive at their final results. In addition, these computations are often
interdependent. That is, the later calculations are dependent on the results of
earlier ones. Consequently, even though an individual round-off error could be
small, the cumulative effect over the course of a large computation can be
significant.
ADDING A LARGE NUMBER AND A SMALL NUMBER:
Suppose we add a small number, 0.0010, to a large number, 4000, using a
hypothetical computer with the 4-digit mantissa and the 1-digit exponent. We
modify the smaller number so that its exponent matches the larger,
0.4000 · 104
0.0000001 · 104
0.4000001 · 104
which is chopped to 0.4000 ·104. Thus, we might as well have not performed
the addition!
ARITHMETIC COMPUTATIONS
SUBTRACTIVE CANCELLATIONS:
This term refers to the round-off induced when subtracting two nearly equal
floating-point numbers.
The easiest example is with the formula for the derivative of f(x) = x2 with x =
3.253 and h = 0.002 and we are using a 4-digit mantissa with 1-digit exponent
and rounding off:
𝑑𝑦 𝑓 𝑥 + ℎ − 𝑓 𝑥 3.2552 − 3.2533 0.1060 · 102 − 0.1058 · 102
= = =
𝑑𝑥 ℎ 0.002 0.2000 · 10-2
0.0002 · 102 0.2000 · 10-1
= −2
= −1
= 10 · 100 = 0.1000 · 102 = 10
0.2000 · 10 0.0200 · 10
𝑑𝑦
which is a poor approximation for the actual derivative = 6.506.
𝑑𝑥
REFERENCE
Chapra, S. C., & Canale, R. P. (2010). Numerical Methods for
Engineers (6th Edition). McGraw-Hill.
THE END