Floa NG Point: 15 - 213: Introduc On To Computer Systems 4 Lecture, Sep 5, 2013
Floa NG Point: 15 - 213: Introduc On To Computer Systems 4 Lecture, Sep 5, 2013
Floang Point
Carnegie Mellon
Carnegie Mellon
What is 1011.1012?
Carnegie Mellon
Representaon
Bits to right of binary point represent fraconal powers of 2 Represents raonal number:
2-j
Carnegie Mellon
Value
Representaon
Observaons
Divide
by
2
by
shiing
right
(unsigned)
Mulply
by
2
by
shiing
le
Numbers
of
form
0.1111112
are
just
below
1.0
Carnegie Mellon
Representable Numbers
Limitaon #1
Other raonal numbers have repeang bit representaons 1/3 1/5 1/10
Value
Limitaon #2
Carnegie Mellon
Carnegie Mellon
Nice standards for rounding, overow, underow Hard to make fast in hardware
Carnegie Mellon
Sign bit s determines whether number is negave or posive Signicand M normally a fraconal value in range [1.0,2.0). Exponent E weights value by power of two
Encoding
MSB
s
is
sign
bit
s
exp
eld
encodes
E
(but
is
not
equal
to
E)
frac
eld
encodes
M
(but
is
not
equal
to
M)
s exp frac
9
Carnegie Mellon
Precision opons
63 or 64-bits
10
Carnegie Mellon
Normalized Values
When:
exp
0000
and
exp
1111
Exponent
coded
as
a
biased
value:
E
=
Exp
Bias
Exp:
unsigned
value
exp
Bias
=
2k-1
-
1,
where
k
is
number
of
exponent
bits
Single precision: 127 (Exp: 1254, E: -126127) Double precision: 1023 (Exp: 12046, E: -10221023)
11
Carnegie Mellon
Exponent
100011002
0 10001100 11011011011010000000000
s
Result:
exp
frac
12
Carnegie Mellon
Denormalized Values
Exponent
value:
E
=
Bias
+
1
(instead
of
E
=
0
Bias)
Signicand
coded
with
implied
leading
0:
M
=
0.xxxx2
Cases
Represents
zero
value
Note
disnct
values:
+0
and
0
(why?)
exp
=
0000,
frac
0000
Numbers
closest
to
0.0
Equispaced
13
Carnegie Mellon
Special Values
Condion: exp = 1111 Case: exp = 1111, frac = 0000 Represents value (innity)
Operaon that overows Both posive and negave E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 =
Not-a-Number
(NaN)
Represents
case
when
no
numeric
value
can
be
determined
E.g.,
sqrt(1),
,
0
14
Carnegie Mellon
NaN
Normalized
Denorm -0
+Denorm +0
+Normalized
+ NaN
15
Carnegie Mellon
16
Carnegie Mellon
the sign bit is in the most signicant bit the next four bits are the exponent, with a bias of 7 the last three bits are the frac
17
Carnegie Mellon
largest norm
18
Carnegie Mellon
Distribuon of Values
-15
-10
-5 Denormalized
0 5 Normalized Infinity
10
15
19
Carnegie Mellon
-1
-0.5 Denormalized
0 Normalized
0.5 Infinity
20
Carnegie Mellon
Will
be
greater
than
any
other
values
What
should
comparison
yield?
Otherwise
OK
Denorm
vs.
normalized
Normalized
vs.
innity
21
Carnegie Mellon
22
Carnegie Mellon
23
Carnegie Mellon
Rounding
Rounding
Modes
(illustrate
with
$
rounding)
$1.40
$1.60
$1.50
$2.50
$1.50
$1
$1
$2
$1
$1
$1
$2
$2
$1
$1
$2
$2
$2
$2
$3
$2
$1
$2
$1
$2
24
Carnegie Mellon
Hard to get any other kind without dropping into assembly All others are stascally biased
Round
so
that
least
signicant
digit
is
even
E.g.,
round
to
nearest
hundredth
1.2349999
1.23
(Less
than
half
way)
1.2350001
1.24
(Greater
than
half
way)
1.2350000
1.24
(Half
wayround
up)
1.2450000
1.24
(Half
wayround
down)
25
Carnegie Mellon
Even when least signicant bit is 0 Half way when bits to right of rounding posion = 1002
Examples
Value 2
3/32 2
3/16 2
7/8 2
5/8
26
Carnegie Mellon
FP
Mulplicaon
(1)s1
M1
2E1
x
(1)s2
M2
2E2
s E Exact
Result:
(1)
M
2
s1 ^ s2 M1 x M2 E1 + E2
Fixing
Implementaon
27
Carnegie Mellon
S ign s, signicand M:
(1)s2 M2 (1)s M
Fixing
I f
M
2,
shi
M
right,
increment
E
i f
M
<
1,
shi
M
le
k
posions,
decrement
E
by
k
O verow
if
E
out
of
range
R ound
M
to
t
frac
precision
28
Carnegie Mellon
29
Carnegie Mellon
Floang Point in C
Conversions/Casng
C asng
between
int,
float,
and
double
changes
bit
representaon
double/float
int
Truncates
fraconal
part
Like
rounding
toward
zero
Not
dened
when
out
of
range
or
NaN:
Generally
sets
to
TMin
int
double
Exact
conversion,
as
long
as
int
has
53
bit
word
size
int
float
Will
round
according
to
rounding
mode
30
Carnegie Mellon
Summary
IEEE
Floang
Point
has
clear
mathemacal
properes
E Represents
numbers
of
form
M
x
2
One
can
reason
about
operaons
independent
of
implementaon
Violates associavity/distribuvity Makes life dicult for compilers & serious numerical applicaons
31
Carnegie Mellon
d == (double)(float) d f == -(-f); 2/3 == 2/3.0 d < 0.0 ((d*2) < 0.0) d > f -f > -d d * d >= 0.0
x == (int)(float) x x == (int)(double) x f == (float)(double) f (d+f)-d == f
32
Carnegie Mellon
Steps
Normalize to have leading 1 1 4-bits Round to t within fracon Postnormalize to deal with eects of rounding
exp
frac 3-bits
Case
Study
128 15 33 35 138 63
Carnegie Mellon
Normalize
exp
frac 3-bits
Requirement
1 4-bits
Set
binary
point
so
that
numbers
of
form
1.xxxxx
Adjust
all
to
have
leading
one
Decrement
exponent
as
shi
le
Value
Binary
Fracon 128 10000000 1.0000000 15 00001101 1.1010000 17 00010001 1.0001000 19 00010011 1.0011000 138 10001010 1.0001010 63 00111111 1.1111100
Exponent 7 3 4 4 7 5
34
Carnegie Mellon
Rounding
Guard
bit:
LSB
of
result
1.BBGRXXX
Scky
bit:
OR
of
remaining
bits
Round up condions
Fracon
GRS
N N N Y Y Y
Incr?
Rounded
35
Carnegie Mellon
Postnormalize
Issue
Rounding
may
have
caused
overow
Handle
by
shiing
right
once
&
incremenng
exponent
Value 128 15 17 19 138 63
Rounded 1.000 1.101 1.000 1.010 1.001 10.000
Exp 7 3 4 4 7 5
Adjusted 1.000/6
36
Carnegie Mellon
More Slides
37
Carnegie Mellon
Interesng
Numbers
Descripon
exp Zero
0000 Smallest
Pos.
Denorm.
0000 Single
1.4
x
1045
Double
4.9
x
10324
Largest
Denormalized
0000 Single
1.18
x
1038
Double
2.2
x
10308
Smallest
Pos.
Normalized
0001 Just
larger
than
largest
denormalized
One
0111
Largest
Normalized
1110 Single
3.4
x
1038
Double
1.8
x
10308
frac
0000
0001
{single,double}
Numeric
Value
0.0
2
{23,52}
x
2
{126,1022}
1111
(1.0 ) x 2 {126,1022}
38
Carnegie Mellon
Monotonicity
a b a+c b+c?
Almost
39
Carnegie Mellon
But may generate innity or NaN Mulplicaon Commutave? Mulplicaon is Associave? Possibility of overow, inexactness of rounding 1 is mulplicave identy? Mulplicaon distributes over addion? Possibility of overow, inexactness of rounding
Monotonicity
a b & c 0 a * c b *c?
Almost
40