0% found this document useful (0 votes)
65 views

Complete Floating Point (Blog)

Floating point numbers represent real numbers in a format that allows for a wide range of values. They have three main parts - a sign bit, exponent, and significand. The exponent and significand are used to represent the number as (-1)^s * (1 + fraction) * 2^exponent. Common formats are single precision which stores numbers with about 6 digits of precision, and double precision with about 16 digits. Floating point operations like addition and multiplication align the decimal points, perform the calculation on the significands, and adjust the exponent accordingly.

Uploaded by

miaahurriff
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Complete Floating Point (Blog)

Floating point numbers represent real numbers in a format that allows for a wide range of values. They have three main parts - a sign bit, exponent, and significand. The exponent and significand are used to represent the number as (-1)^s * (1 + fraction) * 2^exponent. Common formats are single precision which stores numbers with about 6 digits of precision, and double precision with about 16 digits. Floating point operations like addition and multiplication align the decimal points, perform the calculation on the significands, and adjust the exponent accordingly.

Uploaded by

miaahurriff
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Created by: Amira Hurriff

FLOATING POINT : )

What is mean by floating point?


The following are example of floating-point numbers: 3.0 , -111.5 , , 3E-5 it is represent for non-integral numbers.(including very small and very large numbers ) In essence, computers are machines and are capable of representing real numbers only by using complex codes. The most popular code for representing real numbers is called the

integer

*IEEE Floating-Point Standard.

* we will learn more about IEEE floating point standard,after this. :D

Types float and double in c++ , and c ( programming language) in scientific notation i. -4.44 x 10^77 normalized ii. +9.943 x 10^-5 iii. 0.001 x 10^ 3 not normalized

The term floating point is derived from the fact that there is no fixed number of digits before and after the decimal point; that why it is, is called float.

IEEE FLOATING-POINT FORMAT

Example of table IEEE floating-point

x = (-1)^s x (1 + fraction) x 2^(exponent bias)

The Sign Bit


The sign bit is as simple as it gets. 0 denotes a positive number; 1 denotes a negative number. Flipping the value of this bit flips the sign of the number. Normalize significant : 1.0 <= | significant | < 2.0
- always has a leading pre-binary-point 1 bit,so no need to represent it explicitly (hidden bit) Exponent: excess representation: actual exponent + Bias ensures exponent is unsigned . Single : bias = 127 ; Double : bias : 1203

Videos about IEEE floating point ,its help you to more understand about it. Hopefully

smallest value - exponent : 0000001 - actual exponent = 1 27 = -126 - Fraction : 0000000 significand = 1.0 1.0 x 2^-126 1.2 x 10^-38 largest value - Exponents : 11111110 - Actual exponent = 254 127 = +127 - Fraction : 111.11 significand 2.0 2.0 x 2^127 3.4 x 10^38

Single-precision range

Double-Precision Range
smallest value exponent : 00000000000001 Actual exponent = 1 1023 = -1022 Fraction : 00000 , significand = 1.0 1.0 x 2^ -1022 2.2 x 10 ^ -308

largest value
Exponent : 1111111111110 Actual exponent = 2046 1023 = +1023 Fraction : 11111 , significand 2.0 2.0 x 2^ 1023 1.8 x 10^ 308

FLOATING-POINT PRECISION
Relative precision
single : approx 2^-23

- equivalent to 23 x log10 2 23 x 0.3 6 decimal digits of precision


Double : approx 2^-52

- Equivalent to 52 x log10 2 16 decimal digits of precision

Teach about how to calculate single precision and etc

Floating-point example
represent -0.75 -0.75 = (-1) x 1.1 x 2 S=1 Fraction = 100000 Exponent = -1 + Bias single : -1 + 127 = 126 = 01111110 double : -1 + 1023 = 1022 = 01111111110 single : 101111110100000 double : 101111111110100000

What number is represented by the singleprecision float 1100000010100000 S=1 Fraction = 01000.00 Fxponent = 10000001 = 129 x= (-1) x (1 + 01) x 2^(129-127) = (-1) x 1.25 x 2 = -5.0

Floating-point addition
consider a 4-digit decimal example - 9.999 x 10 + 1.610 x 10 1. align decimal points
Shift number with smaller exponent 9.999 x 10 + 0.016 x 10 = 10.015 x 10 2. add significands 9.999 x 10 + 0.016 x 10 = 10.015 x 10 3. normalize result & check for over/underflow 1.0015 x 10 4. Round and renormalize if necessary 1.002 x 10

4-digit binary example 1.000 x 2 + -1.110 x 2 ( 0.5 + - 0.4375) 1. Align binary points
shift number with smaller exponent 1.000 x 2 + -0.111 x 2 2. Add significands 1.000 x 2 + -0.111 x 2 = 0.001 x 2 3. Normalize result & check for over/underflow 1.000 x 2 (no change) = 0.0625

Floating-point adder hardware

Floating point arithmetic hardware(FP ADDER HARDWARE) usually does - Addition , subtraction , multiplication,division, reciprocal, square-root FP= integer conversion Operation usually takes several cycles

Floating-point multiplication
Consider a 4- digit decimal Example : 1.110 x 10 x 9.200 x 10 1. Add exponents For biased exponents , subtract bias from sum New exponent = 10+ -5 =5 2. Multiply significands 1.110 x 9.200 = 10.212 , (10.212 x 10 ) 3. Normalize result & check for over/underflow 1.0212 x 10 4. Round and renormalize if necessary 1.021 x 10 5. Determine sign of result from signs of operands +1.021 x 10

You might also like