0% found this document useful (0 votes)
21 views

Chapter 1 Introduction

This document provides an overview of the MAE 5002 - Advanced Numerical Analysis course offered at Southern University of Science and Technology. The course will cover topics such as interpolation, curve fitting, numerical differentiation, integration, and optimization. It will be taught on Thursdays from 7-10 pm in Lecture Hall 3, room 106. Students will be evaluated based on homework, a midterm exam, and a final exam. The course textbook and additional resources are also listed.

Uploaded by

zhaohhheng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Chapter 1 Introduction

This document provides an overview of the MAE 5002 - Advanced Numerical Analysis course offered at Southern University of Science and Technology. The course will cover topics such as interpolation, curve fitting, numerical differentiation, integration, and optimization. It will be taught on Thursdays from 7-10 pm in Lecture Hall 3, room 106. Students will be evaluated based on homework, a midterm exam, and a final exam. The course textbook and additional resources are also listed.

Uploaded by

zhaohhheng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

MAE 5002 – Advanced Numerical Analysis

Chapter 1: Scientific Computing †

Prof. Mingwu Li
Department of Mechanics and Aerospace Engineering
Southern University of Science and Technology
[email protected]

Feb 22, 2024

† Lecture slides based on the textbook Scientific Computing: An Introductory Survey

by Michael T. Heath, http://www.siam.org/books/cl80 and the slides of Prof. Heath


About myself

• Education background
 2013 B. Eng in Engineering Mechanics,
Huazhong University of Science and Technology
 2016 M. Eng in Computational Mechanics
Dalian University of Technology
 2020 PhD in Mechanical Engineering,
University of Illinois at Urbana-Champaign, USA

• Work experience
 2020 – 2022: Postdoc in ETH Zurich, Switzerland
 2023 Jan – present: SUSTech
Introduction

• About the course (MAE5002)


• Time and location:
• Thursday 7-10:00 pm, weekly, Lecture Hall 3, 106
• Office hour (by appointment via email):
• College of Engineering, North Tower, 821

• This course provides an advanced introduction to numerical analysis


suitable for graduate students in mathematics, computer science,
physical sciences, and engineering. The reader is assumed to be
familiar with calculus and has taken a structured programming course.
• It covers numerous topics including Interpolation and Polynomial
Approximation, Curve Fitting, Numerical Differentiation, Numerical
Integration, and Numerical Optimization. Introduction to last
development, such as machine learning technique will also be given.
• For engineering and computer science fields.
Course contents
Section 1 Introduction
Section 2 Systems of linear equations
Section 3 Linear least squares (curve fitting)
Section 4 Nonlinear equations
Section 5 Optimization
Section 6 Interpolation
Section 7 Numerical integration and differentiation
Section 8 Numerical solution of ODE – initial value problem
Section 9 Numerical solution of ODE – boundary value problem
Section 10 Numerical solution of PDE
Section 11 Introduction to machine learning (if time allows)
Evaluation
• Homework (Problems + Programming) 30%
• Midterm exam 30%
• Final exam 40%

• Fail the course if


• Cheat in exams
• Plagiarize homework
• Copy others’ codes directly

• Textbook
• M. T. Heath, Scientific Computing - An Introductory
Survey
• Baidu, Google, Zhihu, Bilibili, YouTube, Stack
Overflow, Math.Stackexhange, ChatGPT……

• https://bb.sustech.edu.cn/ (Blackboard)
• Lecture slides
• Homework assignments, …
Other useful reading materials

• Mathews et al, Numerical Methods Using MATLAB

• Trefethen, Numerical Linear Algebra, SIAM, 1997

• Press et al. Numerical recipes – the art of scientific computing


Teaching Assistants

Yu Gui (桂雨),
[email protected]

Zhihong Wu (吴志宏) Qiyao Shi (师启耀)


[email protected] [email protected]
Introduction

• Scientific computing

• Approximations

• Forward and backward error

• Conditioning, stability, and accuracy

• Floating-Point Arithmetic
Scientific Computing
What Is Scientific Computing?
 Design and analysis of algorithms for solving mathematical problems
arising in science and engineering numerically

Computer
Sc i en c e

S c i en tific
Computing

Ap p l i e d S c i en c e &
Mathematics Engineering

 Also called numerical analysis or computational mathematics


Scientific Computing, continued

 Distinguishing features of scientific computing


o Deals with continuous quantities (e.g., time, distance, velocity,
temperature, density, pressure) typically measured by real
numbers
o Considers effects of approximations

 Why scientific computing?


o Predictive simulation of natural phenomena
o Virtual prototyping of engineering designs
o Analyzing data
Numerical Analysis → Scientific Computing
 Pre-computer era (before ∼1940)
o Foundations and basic methods established by Newton, Euler,
Lagrange, Gauss, and many other mathematicians, scientists,
and engineers

 Pre-integrated circuit era (∼1940-1970): Numerical Analysis


o Programming languages developed for scientific applications
o Numerical methods formalized in computer algorithms and software
o Floating-point arithmetic developed

 Integrated circuit era (since ∼1970): Scientific Computing


o Application problem sizes explode as computing capacity
grows exponentially
o Computation becomes an essential component of modern scientific
research and engineering practice, along with theory and
experiment
Mathematical Problems

 Given mathematical relationship y = f (x), typical problems


include
o Evaluate a function: compute output y for given input x
o Solve an equation: find input x that produces given output y
o Optimize: find x that yields extreme value of y over given domain

 Specific type of problem and best approach to solving it depend on


whether variables and function involved are
o discrete or continuous
o linear or nonlinear
o finite or infinite dimensional
o purely algebraic or involve derivatives or integrals
General Problem-Solving Strategy

 Replace difficult problem by easier one having same or closely related


solution
o infinite dimensional → finite dimensional
o differential → algebraic
o nonlinear → linear
o complicated → simple

 Solution obtained may only approximate that of original problem

 Our goal is to estimate accuracy and ensure that it suffices


Approximations
9

Approximations

I’ve learned that, in the description of Nature, one has to


tolerate approximations, and that work with approximations can
be interesting and can sometimes be beautiful.
— P. A. M. Dirac
Sources of Approximation

 Before computation
o modeling
o empirical measurements
o previous computations

 During computation
o truncation or discretization (mathematical approximations)
o rounding (arithmetic approximations)

 Accuracy of final result reflects all of these

 Uncertainty in input may be amplified by problem

 Perturbations during computation may be amplified by algorithm


Example: Approximations

 Computing surface area of Earth using formula A = 4πr 2 involves


several approximations
o Earth is modeled as a sphere, idealizing its true shape
o Value for radius is based on empirical measurements and previous
computations
o Value for π requires truncating infinite process
o Values for input data and results of arithmetic operations are
rounded by calculator or computer
Absolute Error and Relative Error
 Absolute error: approximate value − true value

absolute error
 Relative error:
true value

 Equivalently, approx value = (true value) × (1 + rel error)

 Relative error can also be expressed as percentage

per cent error = relative error × 100

 True value is usually unknown, so we estimate or bound error rather


than compute it exactly

 Relative error often taken relative to approximate value, rather than


(unknown) true value
Data Error and Computational Error

 Typical problem: evaluate function f : R → R for given argument


� x = true value of input
� f (x) = corresponding output value for true function
� x̂ = approximate (inexact) input actually used
� fˆ = approximate function actually computed

 Total error: fˆ (x̂) − f (x) =

fˆ (x̂) − f (x̂) + f (x̂) − f (x)


computational error + propagated data error

Algorithm has no effect on propagated data error


Example: Data Error and Computational Error
 Suppose we need a “quick and dirty” approximation to sin(π/ 8) that
we can compute without a calculator or computer
 Instead of true input x = π/ 8, we use x̂ = 3/ 8

 Instead of true function f (x) = sin(x), we use first term of Taylor


series for sin(x), so that fˆ (x) = x

 We obtain approximate result ŷ = 3/ 8 = 0.3750

 To four digits, true result is y = sin(π/ 8) = 0.3827

 Computational error:

fˆ (x̂) − f (x̂) = 3/ 8 − sin(3/ 8) ≈ 0.3750 − 0.3663 = 0.0087

 Propagated data error:


f (x̂) − f (x) = sin(3/ 8) − sin(π/ 8) ≈ 0.3663 − 0.3827 = −0.0164

 Total error: fˆ (x̂) − f (x) ≈ 0.3750 − 0.3827 = −0.0077


Truncation Error and Rounding Error
 Truncation error: difference between true result (for actual input)
and result produced by given algorithm using exact arithmetic
o Due to mathematical approximations such as truncating
infinite series, discrete approximation of derivatives or integrals,
or terminating iterative sequence before convergence

 Rounding error: difference between result produced by given


algorithm using exact arithmetic and result produced by same
algorithm using limited precision arithmetic
o Due to inexact representation of real numbers and
arithmetic operations upon them

 Computational error is sum of truncation error and rounding error


One of these usually dominates

( interactive example )
Example: Finite Difference Approximation

 Error in finite difference approximation

f (x + h) − f (x)
f’(x) ≈
h
exhibits tradeoff between rounding error and truncation error

 Truncation error bounded by Mh/ 2, where M bounds |f‘’(t )| for t


near x

 Rounding error bounded by 2𝜖 /h, where error in function values


bounded by 𝜖

 Total error minimized when ℎ ≈ 2 𝜖/𝑀

 Error increases for smaller h because of rounding error and increases


for larger h because of truncation error
Example: Finite Difference Approximation
A demo
Forward and Backward Error
Forward and Backward Error
 Suppose we want to compute y = f (x), where f : R → R, but
obtain approximate value ŷ
 Forward error: Difference between computed result ŷ and true
output y,
∆ y = ŷ − y
 Backward error: Difference between actual input x and input x̂ for
which computed result ŷ is exactly correct (i.e., f (x̂) = ŷ),

∆ x = x̂ − x
Example: Forward and Backward Error

 As approximation to 𝑦 = 2, 𝑦Ƹ = 1.4 has absolute forward error


|∆ y| = |ŷ − y| = |1.4 − 1.41421. . . | ≈ 0.0142 or

relative forward error of about 1 percent

 Since 1.96= 1.4, absolute backward error is


|∆ x| = |x̂ − x| = |1.96 − 2| = 0.04 or

relative backward error of 2 percent

 Ratio of relative forward error to relative backward error is so


important we will shortly give it a name
Backward Error Analysis

 Idea: approximate solution is exact solution to modified problem

 How much must original problem change to give result actually


obtained?

 How much data error in input would explain all error in computed
result?

 Approximate solution is good if it is exact solution to nearby


problem

 If backward error is smaller than uncertainty in input, then


approximate solution is as accurate as problem warrants

 Backward error analysis is useful because backward error is often


easier to estimate than forward error
Example: Backward Error Analysis

 Approximating cosine function f (x) = cos(x) by truncating Taylor


series after two terms gives

ŷ = fˆ (x) = 1 − x 2/ 2

 Forward error is given by

∆ y = ŷ − y = fˆ (x) − f (x) = 1 − x 2/ 2 − cos(x)

 To determine backward error, need value x̂ such that

f (x̂) = fˆ (x)

 For cosine function, x̂ = arccos(fˆ (x)) = arccos(ŷ)


Example, continued

 For x = 1,

y = f (1) = cos(1) ≈ 0.5403


ŷ = ˆf (1) = 1 − 12/ 2 = 0.5
x̂ = arccos(ŷ) = arccos(0.5) ≈ 1.0472

 Forward error: ∆ y = ŷ − y ≈ 0.5 − 0.5403 = −0.0403

 Backward error: ∆ x = x̂ − x ≈ 1.0472 − 1 = 0.0472


Conditioning, Stability, and Accuracy
Well-Posed Problems

 Mathematical problem is well-posed if solution


o exists
o is unique
o depends continuously on problem data

Otherwise, problem is ill-posed

 Even if problem is well-posed, solution may still be sensitive to


perturbations in input data

 Stability : Computational algorithm should not make sensitivity


worse
Sensitivity and Conditioning

 Problem is insensitive, or well-conditioned, if relative change in input


causes similar relative change in solution

 Problem is sensitive, or ill-conditioned, if relative change in solution


can be much larger than that in input data

 Condition number :
|relative change in solution|
cond =
|relative change in input data|

|[f (x̂) − f (x)]/ f (x)| |∆ y/ y|


= =
|(x̂ − x)/ x| |∆ x/ x|

Problem is sensitive, or ill-conditioned, if cond » 1


Sensitivity and Conditioning

ŷ ŷ

x̂ x̂ ŷ

x x x y

y y

Ill-Posed Ill-Conditioned Well-Conditioned


Condition Number

 Condition number is amplification factor relating relative forward


error to relative backward error
relative relative
forward error = cond × backward error

 Condition number usually is not known exactly and may vary with
input, so rough estimate or upper bound is used for cond, yielding

relative ≤ cond × relative


forward error backward error
Example: Evaluating a Function

 Evaluating function f for approximate input x̂ = x + ∆ x instead of


true input x gives
Absolute forward error: f (x + ∆ x) − f (x) ≈ f ' (x) ∆ x

f (x + ∆ x) − f (x) f’ (x) ∆ x
Relative forward error:
f (x) ≈
f (x)

Condition number:

 Relative error in function value can be much larger or smaller than


that in input, depending on particular f and x

 Note that cond(𝑓 −1) = 1/ cond(𝑓 )


Example: Condition Number
 Consider 𝑓 𝑥 = 𝑥

 Since 𝑓 ′ 𝑥 = 1/(2 𝑥),


𝑥
𝑥𝑓′ 𝑥 2 𝑥 1
cond ≈ = =
𝑓𝑥 𝑥 2

 So forward error is about half backward error, consistent with our


previous example with 2

 Similarly, for f (x) = x 2,


𝑥𝑓′ 𝑥 2𝑥 2
cond ≈ = =2
𝑓 𝑥 𝑥2

which is reciprocal of that for square root, as expected

 Square and square root are both relatively well-conditioned


Example: Sensitivity

 Tangent function is sensitive for arguments near π/ 2


o tan(1.57079) ≈ 1.58058 × 105
o tan(1.57078) ≈ 6.12490 × 104

 Relative change in output is a quarter million times greater than


relative change in input
o For x = 1.57079, cond ≈ 2.48275 × 105
Stability

 Algorithm is stable if result produced is relatively insensitive to


perturbations during computation

 Stability of algorithms is analogous to conditioning of problems

 From point of view of backward error analysis, algorithm is stable if


result produced is exact solution to nearby problem

 For stable algorithm, effect of computational error is no worse than


effect of small data error in input
Accuracy

 Accuracy: closeness of computed solution to true solution (i.e.,


relative forward error)

 Stability alone does not guarantee accurate results

 Accuracy depends on conditioning of problem as well as stability of


algorithm

 Inaccuracy can result from


o applying stable algorithm to ill-conditioned problem
o applying unstable algorithm to well-conditioned problem
o applying unstable algorithm to ill-conditioned problem (yikes!)

 Applying stable algorithm to well-conditioned problem yields


accurate solution
Summary – Error Analysis

 Scientific computing involves various types of approximations that


affect accuracy of results

 Conditioning: Does problem amplify uncertainty in input?

 Stability: Does algorithm amplify computational errors?

 Accuracy of computed result depends on both conditioning of


problem and stability of algorithm

 Stable algorithm applied to well-conditioned problem yields accurate


solution
Floating-Point Numbers
Floating-Point Numbers

 Similar to scientific notation

 Floating-point number system characterized by four integers


β base or radix
p precision
[L, U ] exponent range

 Real number x is represented as

where 0 ≤ di ≤ β − 1, i = 0, . . . , p − 1, and L ≤ E ≤ U
Floating-Point Numbers, continued

 Portions of floating-poing number designated as follows


 exponent : E
 mantissa (尾数或有效数字) : d0d1 ···dp− 1
 Fraction(小数) : d1d2 ···dp− 1

 Sign, exponent, and mantissa are stored in separate fixed-width


fields of each floating-point word
Typical Floating-Point Systems

Parameters for typical floating-point systems


system β p L U
IEEE SP(单精度) 2 24 −126 127
IEEE DP(双精度) 2 53 −1022 1023
Cray-1 2 48 −16383 16384
HP calculator 10 12 −499 499
IBM mainframe 16 6 −64 63

 Modern computers use binary (β = 2) arithmetic

 IEEE floating-point systems are now almost universal in digital


computers
Normalization

 Floating-point system is normalized if leading digit d0 is


always nonzero unless number represented is zero

 In normalized system, mantissa m of nonzero floating-point


number always satisfies 1 ≤ m < β

 Reasons for normalization


� representation of each number unique
� no digits wasted on leading zeros
� leading bit need not be stored (in binary system) because it is
always 1
Properties of Floating-Point Systems

 Floating-point number system is finite and discrete

 Total number of normalized floating-point numbers is

2(β − 1)β p− 1(U − L + 1) + 1

 Smallest positive normalized number: UFL = β


L

U+1(1 − β − p )
 Largest floating-point number: OFL = β

 Floating-point numbers equally spaced only between successive


powers of β

 Not all real numbers exactly representable; those that are are called
machine numbers
Example: Floating-Point System

 Tick marks indicate all 25 numbers in floating-point system having


β = 2, p = 3, L = −1, and U = 1
� OFL = (1.11)2 × 21 = (3.5)10
� UFL = (1.00)2 × 2− 1 = (0.5)10

 At sufficiently high magnification, all normalized floating-point


systems look grainy and unequally spaced
Rounding Rules
 If real number x is not exactly representable, then it is approximated
by “nearby” floating-point number fl(x)

 This process is called rounding, and error introduced is called


rounding error

 Two commonly used rounding rules


o chop (截断): truncate base-β expansion of x after (p − 1)st
digit; also called round toward zero
o round to nearest : fl(x ) is nearest floating-point number to x ,
using floating-point number whose last stored digit is even (偶数)
in case of tie; also called round to even

 Round to nearest is most accurate, and is default rounding rule in


IEEE systems
Example

 Example 1.10 Rounding the following decimal numbers (十进制数)


to two digits:

 Type help round in MATLAB


Machine Precision

 Accuracy of floating-point system characterized by unit roundoff (or


machine precision or machine epsilon) denoted by 𝜖mach
� With rounding by chopping, 𝜖mach = β 1− p

With rounding to nearest, 𝜖mach= 21β 1− p

 Alternative definition is smallest number 𝜖 such that fl(1+ 𝜖 ) > 1

 Maximum relative error in representing real number x within range


of floating-point system is given by

 Type eps in MATLAB and see what you get (For IEEE DP, 𝛽 = 2, 𝑝
= 53)
Machine Precision, continued

 For toy system illustrated earlier


o 𝜖mach = (0.01)2 = (0.25)10 with rounding by chopping
o 𝜖mach = (0.001)2 = (0.125)10 with rounding to nearest

 For IEEE floating-point systems


o 𝜖mach = 2− 24 ≈ 10− 7 in single precision
o 𝜖mach = 2− 53 ≈ 10− 16 in double precision
o 𝜖mach = 2− 113 ≈ 10− 36 in quadruple precision

 So IEEE single, double, and quadruple precision systems have about


7, 16, and 36 decimal digits of precision, respectively
Machine Precision, continued

 Though both are “small,” unit roundoff 𝜖mach should not be


confused with underflow level UFL
o 𝜖mach determined by number of digits in mantissa
o UFL determined by number of digits in exponent

 In practical floating-point systems,

0 < UFL < 𝜖mach < OFL


Exceptional Values

 IEEE floating-point standard provides special values to indicate two


exceptional situations
o I n f , which stands for “infinity,” results from dividing a finite
number by zero, such as 1/ 0
o NaN, which stands for “not a number,” results from undefined
or indeterminate operations such as 0/ 0, 0 ∗I n f , or I n f / I n f

 Inf and NaN are implemented in IEEE arithmetic through special


reserved values of exponent field
Floating-Point Arithmetic
Floating-Point Arithmetic

 Addition or subtraction : Shifting mantissa to make exponents


match may cause loss of some digits of smaller number, possibly all
of them (see later examples)

 Multiplication : Product of two p-digit mantissas contains up to 2p


digits, so result may not be representable

 Division : Quotient of two p-digit mantissas may contain more than


p digits, such as nonterminating binary expansion of 1/ 10

 Result of floating-point arithmetic operation may differ from result


of corresponding real arithmetic operation on same operands

1 = 1 2 , 10 = 101 2 , 0.0001 2 < 0.1 < 0.001 2


Example: Floating-Point Arithmetic

 Assume β = 10, p = 6 ( 6位有效数字)

 Let x = 1.92403 × 102, y = 6.35782 × 10− 1

 Floating-point addition gives x + y = 1.93039 × 102, assuming


rounding to nearest

 Last two digits of y do not affect result, and with even smaller
exponent, y could have had no effect on result

 Floating-point multiplication gives x ∗y = 1.22326 × 102, which


discards half of digits of true product
51

Floating-Point Arithmetic, continued

 Real result may also fail to be representable because its exponent is


beyond available range

 Overflow is usually more serious than underflow because there is no


good approximation to arbitrarily large magnitudes in floating-point
system, whereas zero is often reasonable approximation for
arbitrarily small magnitudes

 On many computer systems overflow is fatal, but an underflow may


be silently set to zero
52

Example: Summing a Series


 Infinite series

is divergent, yet has finite sum in floating-point arithmetic

 Possible explanations
o Partial sum eventually overflows
o 1/ n eventually underflows
o Partial sum ceases to change once 1/ n becomes negligible relative to
partial sum

( interactive example )
53

Floating-Point Arithmetic, continued

 Ideally, x flop y = fl(x op y), i.e., floating-point arithmetic


operations produce correctly rounded results

 Computers satisfying IEEE floating-point standard achieve this ideal


provided x op y is within range of floating-point system

 But some familiar laws of real arithmetic not necessarily valid in


floating-point system

 Floating-point addition and multiplication are commutative but not


associative

 Example: if 𝜖 is a positive floating-point number slightly smaller than


𝜖mach ,then (1 + 𝜖) + 𝜖 = 1, but 1 + ( 𝜖 + 𝜖) > 1
A demo

In MATLAB, eps = 𝛽1−𝑝 , 𝜖mach = 0.5eps


54

Cancellation

 Subtraction between two p-digit numbers having same sign and


similar magnitudes yields result with fewer than p digits, so it is
usually exactly representable

 Reason is that leading digits of two numbers cancel (i.e., their


difference is zero)

 For example,

1.92403 × 102 − 1.92275 × 102 = 1.28000 × 10− 1

which is correct, and exactly representable, but has only three


significant digits
55

Cancellation, continued

 Despite exactness of result, cancellation often implies serious loss of


information

 Operands are often uncertain due to rounding or other previous


errors, so relative uncertainty in difference may be large

 Example: if 𝜖 is positive floating-point number slightly smaller than


𝜖mach, then
(1+ 𝜖) − (1- 𝜖) = 1 − 1 = 0
in floating-point arithmetic, which is correct for actual operands of
final subtraction, but true result of overall computation, 2 𝜖, has
been completely lost

 Subtraction itself is not at fault: it merely signals loss of information


that had already occurred
56

Cancellation, continued

 Digits lost to cancellation are most significant, leading digits,


whereas digits lost in rounding are least significant, trailing digits

 Because of this effect, it is generally bad to compute any small


quantity as difference of large quantities, since rounding error is
likely to dominate result

 For example, summing alternating series, such as

x2 x3
ex = 1 + x + + + ···
2! 3!
for x < 0, may give disastrous results due to catastrophic
cancellation
57

Example: Cancellation

Total energy of helium atom (氢原子) is sum of kinetic and potential


energies, which are computed separately and have opposite signs, so
suffer cancellation
Year Kinetic Potential Total
1971 13.0 −14.0 −1.0
1977 12.76 −14.02 −1.26
1980 12.22 −14.35 −2.13
1985 12.28 −14.65 −2.37
1988 12.40 −14.84 −2.44

Although computed values for kinetic and potential energies changed by


only 6% or less, resulting estimate for total energy changed by 144%
Example: Quadratic Formula
 Two solutions of quadratic equation ax2 + bx + c = 0 are given by

 Naive use of formula can suffer overflow, or underflow, or severe


cancellation
 Rescaling coefficients avoids overflow or harmful underflow
 Cancellation between −b and square root can be avoided by
computing one root using alternative formula
2c
x=
−b ∓ √ b2 − 4ac

 Cancellation inside square root cannot be easily avoided without


using higher precision

( interactive example ) Example 1.15


Summary – Floating-Point Arithmetic
 On computers, infinite continuum of real numbers is approximated
by finite and discrete floating-point number system, with sign,
exponent, and mantissa fields within each floating-point word

 Exponent field determines range of representable magnitudes,


characterized by underflow and overflow levels

 Mantissa field determines precision, and hence relative accuracy, of


floating-point approximation, characterized by unit roundoff Emach

 Rounding error is loss of least significant, trailing digits when


approximating true real number by nearby floating-point number

 More insidiously, cancellation is loss of most significant, leading


digits when numbers of similar magnitude are subtracted, resulting
in fewer significant digits in finite precision

You might also like