Floating-Point Numbers and Round-Off Errors by Kusal Kaluarachchi Medium
Floating-Point Numbers and Round-Off Errors by Kusal Kaluarachchi Medium
You don’t. In fact, they are not always correct, and you must understand the
limitations.In this article I’m going to talk about one limitation.
First, consider a number like 25. The computer doesn’t directly represent 2S
or 5S. Everything is converted into binary.
B nary
Dec mal.EE
As you can see, there is no finite binary representation for 0.1. Computer
systems have finite memory. But we need to represent numbers that take an
infinite number of bits. So how do we do that?
kayannokta
Most computers these days implement IEEE 754 binary floating-point
arithmetic. IEEE 754 specifies very careful rules to ensure accurate results at
kesinlik
the available precision. It also specifies different degrees of precision for
different applications. For example, there’s 32-bit single precision, 43-bit
extended precision, and 64-bit double precision. It has three main
components, as follows:
s m
3. The exponent field contains 127 plus the true exponent for single-
precision, or 1023 plus the true exponent for double precision.
Bit representation
First, we convert it into binary format. We have to find the number 9 and 0.1
binary format separately.
9L
9 1001 base
1001
forthetract on 0.1
0.4 0
0.2
0.4 0.8 0
0.8 16 1
1.2 1
0.6
nf n ty 0
0.2 0.4
0
0.4 0.8
As I’ve showed in the above figure, it never going to end.The marked part is
going to repeat again and again. It is a infinite binary representation.
0001100110011001100...
so for 9.1
1001.0001100110011001100...
1.0010001100110011001100… x 2³
397 D'sından
L
Now let’s write this in the IEEE standard. To allow for negative exponents,
127 is added to that exponent to get the representation. We say that the
exponent is “biased” by 127.So the range of possible exponents is -128 to
127 . Because we have a positive power ( 2³) it should be added with Exponent
Bias (127). exponentb as127
3+127 = 130
Now lets look at the Mantissa, The mantissa is the next 23 bits(for single
precision) .so the binary value of 9.1 is 1.0010001100110011001100… x ²³ and
the Mantissa will be,
00100011001100110011001
Now lets write the the value, since 9.1 is positive number so the sign bit is
zero .The total representation of 9.1 in single precision will be like this
24ᵗʰG t
01000001000100011001100110011001
But what computer does is, if the 24th bit is 1, it will add 1 to the 23rd bit (if 0
it doesn’t do anything) and store.
01000001000100011001100110011010
If you check this with IEEE 754 calculator you will get this same answer.
https://www.binaryconvert.com/result_float.html?decimal=057046049
As you can see, there is a small difference at the end of the mantissa. This
difference is because of the rounding error. Why the rounding error happens
is that there is a limit to the number of bits we can represent the mantissa.
ava l
2
31
As you can see first we convert 9.1 to IEEE 754 standard and then IEEE 754
value to decimal value. we got 9.10000038.So this is the floating point
rounding problem.
https://www.smbc-comics.com/?id=2999
So, if we are writing an algorithm, we should handle this floating point error.
Sinking of an oil rig — In 1992, the Sleipner A oil and gas platform sank in
the North Sea near Norway. Numerical issues in modelling the structure
caused shear stresses to be underestimated by 47%. As a result, concrete
walls were not built thick enough.
Explosion of the Ariane 5 rocket — just after lift-off on its maiden voyage
off French Guiana, on June 4, 1996, was ultimately the consequence of a
simple overflow.
95% of folks out there are completely clueless about floating-point -James Gosling
(in JavaOne keynote address)
BigDecimal in Java
The java.math.BigDecimal class provides operations for arithmetic,
manipulation, rounding, comparison and format conversion.
For instance, BigDecimal 2.18 has the unscaled value of 218 and the scale of
2.
Output
10.0
9.9
9.8
9.700000000000001
9.600000000000001
9.500000000000002
9.400000000000002
9.300000000000002
9.200000000000003
9.100000000000003
9.000000000000004
8.900000000000004
8.800000000000004
8.700000000000005
8.600000000000005
8.500000000000005
8.400000000000006
8.300000000000006
8.200000000000006
8.100000000000007
8.000000000000007
7.9000000000000075
7.800000000000008
7.700000000000008
7.6000000000000085
7.500000000000009
7.400000000000009
7.30000000000001
7.20000000000001
7.10000000000001
7.000000000000011
6.900000000000011
6.800000000000011
6.700000000000012
6.600000000000012
6.500000000000012
6.400000000000013
6.300000000000013
6.2000000000000135
6.100000000000014
6.000000000000014
5.900000000000015
5.800000000000015
5.700000000000015
5.600000000000016
5.500000000000016
5.400000000000016
5.300000000000017
5.200000000000017
5.100000000000017
5.000000000000018
4.900000000000018
4.8000000000000185
4.700000000000019
4.600000000000019
4.5000000000000195
4.40000000000002
4.30000000000002
4.200000000000021
4.100000000000021
4.000000000000021
3.9000000000000212
3.800000000000021
3.700000000000021
3.600000000000021
3.500000000000021
3.400000000000021
3.3000000000000207
3.2000000000000206
3.1000000000000205
3.0000000000000204
2.9000000000000203
2.8000000000000203
2.70000000000002
2.60000000000002
2.50000000000002
IEEE 754 number representat on 32b t float ngpo ntnotat on
263.3
1
EE
i
263 100000101
0.3
I IE
010011001100110011
11 263.3 01001100110011
170 n 1
t.EEFn Y
ttrom12 0
163
EFiI'YEEIE.IE b tsfollow ngdec mal
IE Isma lov nary
po ntnscnotat on
iii
3 0 00000 1 00000 0 0 001 001 001 0