0% found this document useful (0 votes)
44 views

CVE 154 - Statics of Rigid Bodies: Lesson 2: Errors in Numerical Computation

This document discusses significant figures and errors in numerical computation. It provides examples of how to determine the number of significant figures in measurements involving things like speedometers and odometers. Rules are given for identifying significant figures in written numbers. The concepts of accuracy, precision, inaccuracy, and imprecision are defined. Different ways of representing errors, such as true error and approximate percent relative error, are explained. Finally, an example problem is provided to illustrate calculating true errors and true percent relative errors.

Uploaded by

Ice Box
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

CVE 154 - Statics of Rigid Bodies: Lesson 2: Errors in Numerical Computation

This document discusses significant figures and errors in numerical computation. It provides examples of how to determine the number of significant figures in measurements involving things like speedometers and odometers. Rules are given for identifying significant figures in written numbers. The concepts of accuracy, precision, inaccuracy, and imprecision are defined. Different ways of representing errors, such as true error and approximate percent relative error, are explained. Finally, an example problem is provided to illustrate calculating true errors and true percent relative errors.

Uploaded by

Ice Box
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

CVE 154 – Statics of Rigid Bodies

LESSON 2:
ERRORS IN
NUMERICAL
COMPUTATION
SIGNIFICANT FIGURES
 The concept of a significant figure, or digit, has been developed to formally
designate the reliability of a numerical value. The significant digits of a number
are those that can be used with confidence. They correspond to the number of
certain digits plus one estimated digit.
 For example, the speedometer shown
has two certain digits, 48. It is
conventional to set the estimated digit
at one-half of the smallest scale
division on the measurement device.
Thus the speedometer reading would
consist of the three significant figures:
48.5. In a similar fashion, the odometer
has 6 certain digits, 87324.5. Note that
the white dial on the odometer
corresponds to the tenths decimal
place. Including an estimated digit
would yield to a seven-significant-
ODOMETER
figure reading of 87,324.45.
SIGNIFICANT FIGURES
The following are rules for identifying significant figures when writing or
interpreting numbers.

 All non-zero digits are considered significant.

 Zeros appearing anywhere between two significant figures are significant.

 Zeros to the left of the significant figures (leading zeros) are not significant.

 Zeros to the right of the non-zero digits (trailing zeros) are significant if they are
to the right of the decimal point as these are only necessary to indicate
precision. However trailing zeros placed on the left of decimal point may not be
significant, depending on the precision of the measurement.
SIGNIFICANT FIGURES
EXAMPLE No. of Significant Figures
12345.6789 9
1020.304 7
0.0001234 4
1234.0000 8
3, 4, 5 or 6
123000 Depends if the trailing zeros are known with confidence
based on the precision of the measurement.

To resolve uncertainties due to the last rule, it is better to indicate values in their
scientific notation. For example, 123000 can be written as:
Scientific notation No. of Significant Figures
1.23 × 105 3
1.230 × 105 4
1.2300 × 105 5
1.23000 × 105 6
SIGNIFICANT FIGURES
The concept of significant figures has two important implications for our study of
numerical methods:
1. We can use the number of significant figures as a criteria to specify how
confident we are in the approximate result of our numerical calculations. For
example, we might decide that our approximation is acceptable if it is correct
to four significant figures.
2. Although quantities such as π, e, or 7 represent specific quantities, they
cannot be expressed exactly by a limited number of digits. For example,
π = 3.141592653589793238462643...
Because computers retain only a finite number of significant figures, such
numbers can never be represented exactly. The omission of the remaining
significant figures is called round-off error.
ACCURACY AND PRECISION
DEFINITION:
Accuracy – refers to how closely a computed or measured value agrees with the
true value
Precision – refers to how closely individual computed or measured values agree
with each other
Inaccuracy – (also called bias) is defined as systematic deviation from the truth
Imprecision – (also called uncertainty) refers to the magnitude of the scatter of
values

 Numerical methods should be sufficiently accurate or unbiased to meet the


requirements of a particular engineering problem. They also should be precise
or certain enough for adequate engineering design.
 We will use the collective term error to represent both the inaccuracy and the
imprecision of our predictions.
ACCURACY AND PRECISION
This target practice results illustrate
the concepts of accuracy and
precision. The bullet holes on each
target can be thought of as the
predictions of a numerical technique,
whereas the bull’s-eye represents
the truth.
(a) Inaccurate and imprecise
(b) accurate and imprecise
(c) inaccurate and precise
(d) accurate and precise
Our goal in numerical solution is to
achieve target results shown in (d).
ERROR REPRESENTATION
Errors can be represented in the following ways:
1. True error (Et) – this is the exact value of the error with the reference being
from the true value as denoted by the subscript t. This requires us to know,
beforehand, the true or exact value.
𝐸t = true value − approximation

2. True fractional relative error – this representation normalizes the error to the
true value. This takes into account the order of magnitude of the value under
examination. For example a 1 centimeter error is much more significant if you
are measuring a 100 centimeter length object rather than a 10000 centimeter
length object.
true error
True fractional relative error =
true value
3. True percent relative error (εt) – this representation expresses the true
fractional relative error in percent.
true error 𝐸t
εt = ×100% = ×100%
true value true value
ERROR REPRESENTATION
4. Approximate percent relative error (εa) – this representation, in percent,
normalizes the error to an approximate value as denoted by the subscript a.
In real-world applications, we will obviously not know the true value
beforehand. For these situations, an alternative is to normalize the error using
the best available estimate of the true value, that is, to the approximation
itself.
approximate error
εa = ×100%
approximation
Most numerical solutions follow an iterative approach to compute answers. In
such an approach, a present approximation is made on the basis of a previous
approximation. This process is performed repeatedly, or iteratively, to
successively compute better approximations. Hence, the approximate error is
often estimated as the difference between previous and current
approximations. Thus, the equation becomes:
current approximation − previous approximation
εa = ×100%
current approximation
ERROR REPRESENTATION
Based on the equations, errors can either be positive or negative:
 A positive error occurs, if the approximation is less than the true value (or the
previous approximation is less than the current approximation.
 A negative error occurs, if the approximation is greater than the true value (or
the previous approximation is greater than the current approximation. A
negative error also occurs when computing relative errors, if the denominator
is less than zero (negative value).

ABSOLUTE VALUE OF RELATIVE ERRORS:


 Since most numerical calculations require numerous iterations, we use errors
as the criteria when to stop repeating the calculations. For such cases, the
computation is repeated until a relative error, for example the approximate
percent relative error (εa), is lower than a pre-specified percent tolerance (εs).
That is why it is important for relative errors to be expressed without a positive
or negative sign, by considering the absolute value of the relative error.
εa < εs
ERROR REPRESENTATION
PERCENT TOLERANCE (εs):
 It is also convenient to relate these errors to the number of significant figures in
the approximation. If the formula below is used as the criterion, we can be
assured that the result is correct to at least n significant figures.
εs = 0.5 ×102−𝑛 %
SAMPLE PROBLEM
EXAMPLE 1: SOLUTION:
Suppose that you have the task of a) True errors
measuring the lengths of a bridge and a For the bridge length:
rivet. The resulting measurement values
𝐸t = 10000 cm − 9999 cm = 1 cm
are 9999 cm for the bridge and 9 cm for
the rivet. If the true values are 10,000 cm For the rivet length:
and 10 cm, respectively, compute the 𝐸t = 10 cm − 9 cm = 1 cm
following:
b) True percent relative errors
a) the true error for each case
For the bridge length:
b) the true percent relative error for
each case 1 cm
εt = ×100% = 0.01%
10000 cm
For the rivet length:
1 cm
εt = ×100% = 10%
10 cm
SAMPLE PROBLEM
EXAMPLE 2:
In mathematics, functions can often be represented by infinite series. For example, the
exponential function 𝑒 𝑥 can be computed using:
𝑥 2 𝑥 3 𝑥 4 𝑥 𝑛
𝑒𝑥 = 1 + 𝑥 + + + + ⋯+
2! 3! 4! 𝑛!
Thus, as more terms are added in sequence, the approximation becomes a better and
better estimate of the true value of 𝑒 𝑥 . The equation above is called a Maclaurin series
expansion. Note that the first two terms of the equation is for n = 0 and n = 1 respectively.
Estimate the value of 𝑒 0.5 by continually adding terms to the series. For example,
1st estimate: 𝑒 0.5 = 1
2nd estimate: 𝑒 0.5 = 1 + 𝑥 = 1 + 0.5 = 1.5
𝑥2 0.5 2
3rd estimate: 𝑒 0.5 =1+𝑥 + 2! = 1 + 0.5 + 2!
= 1.625
After each new term is added, compute the true and approximate percent relative errors.
Note that the true value is 𝑒 0.5 = 1.648721271 . . . Add terms until the absolute value of
the approximate error estimate εa falls below a pre-specified error criterion εs conforming
to three significant figures.
SAMPLE PROBLEM
SOLUTION:
First, we need to solve for the error criterion that we are asked to meet, in order for us to
stop calculations. Note that, as required, n = 3 significant figures.
εs = 0.5 ×102−𝑛 % = 0.5 × 102−3 % = 0.05% = 5.000 × 10−2 %
Thus, we will keep adding terms until εa is less than εs.
1st estimate: 𝑒 0.5 = 1
1.648721−1
εt = ×100% = 39.3% = 3.93 × 101 %
1.648721
2nd estimate: 𝑒 0.5 = 1 + 𝑥 = 1 + 0.5 = 1.5
1.648721−1.5
εt = ×100% = 9.02% = 9.02 × 100 %
1.648721
1.5−1
εa = 1.5 ×100% = 33.3% = 3.33 × 101 %
𝑥2 0.5 2
3rd estimate 𝑒 0.5
= 1 + 𝑥 + = 1 + 0.5 + = 1.625
2! 2!
1.648721−1.625
εt = 1.648721 ×100% = 1.44% = 1.44 × 100 %
1.625−1.5
εa = 1.625 ×100% = 7.69% = 7.69 × 100 %
To speed up our calculations, we use MS Excel as shown in the next slide.
SAMPLE PROBLEM
MACLAURIN SERIES OF EXPONENTIAL FUNCTION ex
𝑥2 𝑥3 𝑥4 𝑥𝑛
𝑒𝑥 = 1 + 𝑥 + + + +⋯+
2! 3! 4! 𝑛!
x= 0.5
ex = 1.648721271 true value
εs = 5.00E-02 % percent tolerance

True Pecent Approximate Pecent


Number of Resulting ex
Relative Error Relative Error
Terms with n terms
εt (%) εa (%)
1 1.000000000 3.93E+01
2 1.500000000 9.02E+00 3.33E+01
3 1.625000000 1.44E+00 7.69E+00
4 1.645833333 1.75E-01 1.27E+00
5 1.648437500 1.72E-02 1.58E-01
6 1.648697917 1.42E-03 1.58E-02
When we add six terms of the series, we arrive with an approximate relative error (εa) that
is less than εs = 5.00 × 10−2 %. Hence, the computation is terminated. Notice that the
resulting value of 𝑒 0.5 is accurate to four significant figures, 1.648, when comparing it with
the true value. This is more than the 3 significant figures that the formula should express.
This shows that the formula is conservative, which is fine for our use.
TYPES OF ERRORS
We encounter the following types of errors in numerical calculations:
1. Truncation errors – these result when approximations are used to represent
exact mathematical procedures.
The Maclaurin series is a good example of truncation. Depending on the
number of terms you consider in the summation, the other terms not included
will be the truncation error.
2 3 4 𝑛
𝑥 𝑥 𝑥 𝑥
𝑒𝑥 = 1 + 𝑥 + + + + ⋯ +
2! 3! 4! 𝑛!

2. Round-off errors – these result when numbers having limited significant


figures are used to represent exact numbers.
ROUND-OFF ERRORS
SOURCES OF COMPUTER ROUND-OFF ERRORS:
1. Round-off errors originate from the fact that computers retain only a fixed
number of significant figures during a calculation. Hence, numbers such as π
or e cannot be represented exactly by the computer.
2. Round-off errors also occur because computers use a base-2 representation,
which cannot precisely represent certain exact base-10 numbers.

 This means that numerical round-off errors are directly related to the manner
in which numbers are stored in a computer.
word – it is the fundamental unit whereby information is stored in a computer.
– It consists of a string of binary digits, or bits.
– Numbers are typically stored in one or more words.
NUMBER SYSTEMS
BASE-10 OR DECIMAL NUMBER SYSTEM:
 The base-10 system uses the 10 digits—0, 1, 2, 3, 4, 5, 6, 7, 8, 9— to represent
numbers. By themselves, these digits are satisfactory for counting from 0 to 9.
For numbers greater than 9, combinations of these basic digits are used, with
the position or place value specifying the magnitude. The figure below shows
how a number is formulated in the base-10 system.
NUMBER SYSTEMS
BASE-2 OR BINARY NUMBER SYSTEM:
 This is the number system that is used by computers. This system only uses 2
digits—0 and 1— to represent numbers. This relates to the fact that the
primary logic units of digital computers are on/off electronic components. The
figure below shows how a number is formulated in the base-2 system. Here,
the number 10101101 in the base-2 system is equivalent to the number 173 in
the base-10 system.
INTEGER REPRESENTATION
SIGNED MAGNITUDE METHOD:
 This employs the first bit of a word to indicate the sign, with a 0 for positive and
a 1 for negative. The remaining bits are used to store the number. The figure
below shows the representation of the decimal integer −173 on a 16-bit
computer using this method.

 This method limits the capacity of computers to represent integers, since the
range of integers that will be represented is dependent on how many bits the
computer uses.
INTEGER REPRESENTATION
 If you are using a 16-bit computer, the first bit holds the sign while the
remaining 15 bits can hold binary numbers from 000000000000000 to
111111111111111. The upper limit can be converted to a decimal integer, as in
1 × 214 + 1 × 213 + ⋯ + 1 × 21 + 1 × 20 = 32,767
Thus, a 16-bit computer word can store decimal integers ranging from −32,767
to 32,767. In addition, because zero is already defined as 0000000000000000,
it is redundant to use the number 1000000000000000 to define a “minus zero.”
Therefore, it is usually employed to represent an additional negative number:
−32,768. Therefore,

Range of integers for a 16-bit computer:


From −32,768 to 32,767
Integers falling below the minimum and above the maximum are not represented .
FRACTIONAL REPRESENTATION
FLOATING POINT FORM:
 This form is used to express fractional quantities of numbers. Here, the
number is expressed as a fractional part, called a mantissa or significand, and
an integer part, called an exponent or characteristic, as in
m · be
where
m = the mantissa
b = the base of the number system being used
e = the exponent.
 For instance, the number 156.78 could be represented as 0.15678 × 103 in a
floating point base-10 system.
FRACTIONAL REPRESENTATION
 The figure below shows one way that a floating-point number could be stored
in a word. The first bit is reserved for the sign, the next series of bits for the
signed exponent, and the last bits for the mantissa

 Note that the mantissa is usually normalized if it has leading zero digits. For
example, suppose the quantity 1/34 = 0.029411765...was stored in a floating-
point base-10 system that allowed only four decimal places to be stored. Thus,
1/34 would be stored as
0.0294 × 103
 However, in the process of doing this, the inclusion of the useless zero to the
right of the decimal forces us to drop the digit 1 in the fifth decimal place.
FRACTIONAL REPRESENTATION
 The number can be normalized to remove the leading zero by multiplying the
mantissa by 10 and lowering the exponent by 1 to give
0.2941 × 10−1
Thus, we retain an additional significant figure when the number is stored.
 The consequence of normalization is that the absolute value of the mantissa m
is limited. That is,
1
≤𝑚<1
𝑏
where b = the base. For example, for a base-10 system, m would range between
0.1 and 1, and for a base-2 system, between 0.5 and 1.
 Floating-point representation allows both fractions and very large numbers to
be expressed on the computer. However, floating-point numbers take up more
room and take longer to process than integer numbers. More significantly,
however, their use introduces a source of error because the mantissa holds only
a finite number of significant figures. Thus, a round-off error is introduced.
FRACTIONAL REPRESENTATION
The following are aspects of the floating-point representation that have
significance regarding computer round-off errors:
1. There Is a Limited Range of Quantities That May Be Represented. Just like the
integer case, there are large positive and negative numbers that cannot be
represented. However, in addition to large quantities, the floating-point
representation has the added limitation that very small numbers cannot be
represented.
2. There Are Only a Finite Number of Quantities That Can Be Represented within
the Range. Thus, the degree of precision is limited. Irrational numbers like
𝜋 𝑎𝑛𝑑 𝑒 cannot be exactly represented. Furthermore, rational numbers that
do not exactly match one of the values in the set also cannot be represented
precisely. The errors introduced by approximating both these cases are
referred to as quantizing errors. For example, if π = 3.14159265358 . . . is to be
stored in a base-10 number system with 7 significant figures.
a. By Chopping. It becomes π=3.141592 with Et = 0.00000065...
b. By Rounding. It becomes π=3.141593 with Et = − 0.00000035...
FRACTIONAL REPRESENTATION
3. The Interval between Numbers, Δx, Increases as the Numbers Grow in
Magnitude. Here, quantizing errors will be proportional to the magnitude of
the number being represented.

 All these are demonstrated in the Sample problem in the next slide.
SAMPLE PROBLEM
EXAMPLE 3: The initial 0 indicates that the quantity is
Create a hypothetical floating-point positive. The 1 in the second place
number set for a machine that stores designates that the exponent has a negative
information using 7-bit words. Employ the sign. The 1’s in the third and fourth places
first bit for the sign of the number, the give a maximum value to the exponent of
next three for the sign and the magnitude 1 × 21 + 1 × 20 = 3
of the exponent, and the last three for the Therefore, the exponent will be −3. Finally,
magnitude of the mantissa. Show all the the mantissa is specified by the 100 in the
positive values and their corresponding last three places, which conforms to
base-10 numbers.
1 × 2−1 + 0 × 2−2 + 0 × 2−3 = 0.5
SOLUTION: Although a smaller mantissa is possible (e.g.,
The smallest possible positive number is 000, 001, 010, 011), the value of 100 is used
shown. because of the limit imposed by
normalization. Thus, the smallest possible
positive number for this system is
+ 0.5 × 2−3
which is equal to 0.0625 in the base-10
system.
SAMPLE PROBLEM
The next highest numbers are developed by increasing the mantissa, as in
0111101 = (1 × 2−1 + 0 × 2−2 + 1 × 2−3) × 2−3 = 0.078125 in base-10
0111110 = (1 × 2−1 + 1 × 2−2 + 0 × 2−3) × 2−3 = 0.093750 in base-10
0111111 = (1 × 2−1 + 1 × 2−2 + 1 × 2−3) × 2−3 = 0.109375 in base-10
Notice that the base-10 equivalents are spaced evenly with an interval of 0.015625. At this
point, to continue increasing, we must decrease the exponent to 10, which gives a value of
1 × 21 + 0 × 20 = 2
The mantissa is decreased back to its smallest value of 100. Therefore, the next number is
0110100 = (1×2−1 +0×2−2 +0×2−3)×2−2 = (0.125000)10
This still represents a gap of 0.125000−0.109375 =0.015625. However, now when higher
numbers are generated by increasing the mantissa, the gap is lengthened to 0.03125,
0110101 = (1 × 2−1 + 0 × 2−2 + 1 × 2−3) × 2−2 = 0.156250 in base-10
0110110 = (1 × 2−1 + 1 × 2−2 + 0 × 2−3) × 2−2 = 0.187500 in base-10
0110111 = (1 × 2−1 + 1 × 2−2 + 1 × 2−3) × 2−2 = 0.218750 in base-10
This pattern is repeated as each larger quantity is formulated until a maximum number is
reached,
0011111 = ( 1 × 2−1 + 1 × 2−2 + 1 × 2−3) × 23 = 7.000000 in base-10
Exponent
Number
Sign of

Sign of
21 20 2-1 2-2 2-3 Number in Base-10 Interval

0 1 1 1 1 0 0 0.062500
0 1 1 1 1 0 1 0.078125 0.015625
0 1 1 1 1 1 0 0.093750 0.015625
0 1 1 1 1 1 1 0.109375 0.015625
0 1 1 0 1 0 0 0.125000 0.015625
0 1 1 0 1 0 1 0.156250 0.031250
0 1 1 0 1 1 0 0.187500 0.031250
0 1 1 0 1 1 1 0.218750 0.031250
0 1 0 1 1 0 0 0.250000 0.031250
0 1 0 1 1 0 1 0.312500 0.062500
0 1 0 1 1 1 0 0.375000 0.062500
0 1 0 1 1 1 1 0.437500 0.062500
0 1 0 0 1 0 0 0.500000 0.062500
0 1 0 0 1 0 1 0.625000 0.125000
0 1 0 0 1 1 0 0.750000 0.125000
0 1 0 0 1 1 1 0.875000 0.125000
0 0 0 1 1 0 0 1.000000 0.125000
0 0 0 1 1 0 1 1.250000 0.250000
0 0 0 1 1 1 0 1.500000 0.250000
0 0 0 1 1 1 1 1.750000 0.250000
0 0 1 0 1 0 0 2.000000 0.250000
0 0 1 0 1 0 1 2.500000 0.500000
0 0 1 0 1 1 0 3.000000 0.500000
0 0 1 0 1 1 1 3.500000 0.500000
0 0 1 1 1 0 0 4.000000 0.500000
0 0 1 1 1 0 1 5.000000 1.000000
0 0 1 1 1 1 0 6.000000 1.000000
0 0 1 1 1 1 1 7.000000 1.000000
EXTENDED PRECISION
 It should be noted at this point that, although round-off errors are present, the
number of significant digits carried on most computers allows most engineering
computations to be performed with more than acceptable precision.
 Commercial computers use much larger words and, consequently, allow
numbers to be expressed with more than adequate precision. For example,
computers that use IEEE format allow 24 bits to be used for the mantissa in
single precision, which translates into about seven significant base-10 digits of
precision with a range of about 10−38 to 1039.
 With this acknowledged, there are still cases where round-off error becomes
critical. For this reason most computers allow the specification of extended
precision. The most common of these is double precision, in which the number
of words used to store floating-point numbers is doubled. It provides about 15
to 16 decimal digits of precision and a range of approximately 10−308 to 10308. In
many cases, the use of double-precision quantities can greatly mitigate the
effect of round-off errors. However, a price is paid for such remedies in that
they also require more memory and execution time.
ARITHMETIC COMPUTATIONS
We will use normalized base-10 numbers to demonstrate computations.

ADDITION:
 When two floating-point numbers are added, the mantissa of the number with
the smaller exponent is modified so that the exponents are the same. This has
the effect of aligning the decimal points.
 For example, suppose we want to add 0.1557·101 + 0.4381·10−1. The decimal of
the mantissa of the second number is shifted to the left a number of places
equal to the difference of the exponents [1 −(−1) =2], as in 0.4381·10−1 is turned
into 0.004381·101. Now the numbers can be added,
0.1557 · 101
+ 0.004381 · 101
0.160081 · 101
 and the result chopped to 0.1600 ·101. Notice how the last two digits of the
second number that were shifted to the right have essentially been lost from
the computation.
ARITHMETIC COMPUTATIONS
SUBTRACTION:
 Subtraction is performed identically to addition except that the sign of the
subtrahend is reversed. For example,
0.7642 · 103
− 0.7641 · 103
0.0001 · 103
 For this case the result is not normalized, and so we must shift the decimal
three places to the right to give 0.1000 · 100 = 0.1000. Notice that the zero
added to the end of the mantissa is not significant but is merely appended to
fill the empty space created by the shift.
 This introduces a substantial computational error because subsequent
manipulations would act as if these zeros were significant.
ARITHMETIC COMPUTATIONS
MULTIPLICATION AND DIVISION:
 For multiplication, the exponents are added and the mantissas are multiplied.
Because multiplication of two n-digit mantissas will yield a 2n-digit result, most
computers hold intermediate results in a double-length register. For example,
0.1363 · 103 × 0.6423 · 10−1 = 0.08754549 · 102
 If, as in this case, a leading zero is introduced, the result is normalized,
0.08754549 · 102 → 0.8754549 · 101 and chopped to give
0.8754 · 101
 Division is performed in a similar manner, but the mantissas are divided and the
exponents are subtracted. Then the results are normalized and chopped.
ARITHMETIC COMPUTATIONS
LARGE COMPUTATIONS:
 Certain methods require extremely large numbers of arithmetic manipulations
to arrive at their final results. In addition, these computations are often
interdependent. That is, the later calculations are dependent on the results of
earlier ones. Consequently, even though an individual round-off error could be
small, the cumulative effect over the course of a large computation can be
significant.
ADDING A LARGE NUMBER AND A SMALL NUMBER:
 Suppose we add a small number, 0.0010, to a large number, 4000, using a
hypothetical computer with the 4-digit mantissa and the 1-digit exponent. We
modify the smaller number so that its exponent matches the larger,
0.4000 · 104
0.0000001 · 104
0.4000001 · 104
 which is chopped to 0.4000 ·104. Thus, we might as well have not performed
the addition!
ARITHMETIC COMPUTATIONS
SUBTRACTIVE CANCELLATIONS:
 This term refers to the round-off induced when subtracting two nearly equal
floating-point numbers.
 The easiest example is with the formula for the derivative of f(x) = x2 with x =
3.253 and h = 0.002 and we are using a 4-digit mantissa with 1-digit exponent
and rounding off:
𝑑𝑦 𝑓 𝑥 + ℎ − 𝑓 𝑥 3.2552 − 3.2533 0.1060 · 102 − 0.1058 · 102
= = =
𝑑𝑥 ℎ 0.002 0.2000 · 10-2
0.0002 · 102 0.2000 · 10-1
= −2
= −1
= 10 · 100 = 0.1000 · 102 = 10
0.2000 · 10 0.0200 · 10
𝑑𝑦
which is a poor approximation for the actual derivative = 6.506.
𝑑𝑥
REFERENCE
Chapra, S. C., & Canale, R. P. (2010). Numerical Methods for
Engineers (6th Edition). McGraw-Hill.
THE END

You might also like