0% found this document useful (0 votes)
54 views

Unit 3

This document discusses computer arithmetic and floating point representation. It begins by explaining integer representation using binary, sign-magnitude, and two's complement methods. It then covers integer arithmetic operations like addition, subtraction, multiplication, and division. Next, it discusses floating point representation using the IEEE 754 standard with single and double precision formats. It concludes by explaining floating point arithmetic operations like addition, subtraction, multiplication, and division.

Uploaded by

zccoffin007
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Unit 3

This document discusses computer arithmetic and floating point representation. It begins by explaining integer representation using binary, sign-magnitude, and two's complement methods. It then covers integer arithmetic operations like addition, subtraction, multiplication, and division. Next, it discusses floating point representation using the IEEE 754 standard with single and double precision formats. It concludes by explaining floating point arithmetic operations like addition, subtraction, multiplication, and division.

Uploaded by

zccoffin007
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

UNIT 3-COMPUTER ARITHMETIC

Ms. Shital P. Shinde


(Subject Co-ordinator)
Topic:

• The arithmetic and logic Unit,


• Integer representation,
• Integer arithmetic,
• Floating point representation,
• Floating point arithmetic,
• Introduction of arithmetic co-processor.
Arithmetic & Logic Unit
• Does the calculations
• Everything else in the computer is there to service this
unit
• Handles integers
• May handle floating point (real) numbers
• May be separate FPU (maths co-processor)
• May be on chip separate FPU (486DX +)
ALU Inputs and Outputs
Integer Representation
• Only have 0 & 1 to represent everything
• Positive numbers stored in binary
• e.g. 41=00101001
• No minus sign
• No period
• Sign-Magnitude
• Two’s compliment
Sign-Magnitude
• Left most bit is sign bit
• 0 means positive
• 1 means negative
• +18 = 00010010
• -18 = 10010010
• Problems
• Need to consider both sign and magnitude in arithmetic
• Two representations of zero (+0 and -0)
Two’s Compliment
• +3 = 00000011
• +2 = 00000010
• +1 = 00000001
• +0 = 00000000
• -1 = 11111111
• -2 = 11111110
• -3 = 11111101
Benefits
• One representation of zero
• Arithmetic works easily (see later)
• Negating is fairly easy
• 3 = 00000011
• Boolean complement gives 11111100
• Add 1 to LSB 11111101
Geometric Depiction of Twos
Complement Integers
Negation Special Case 1
• 0= 00000000
• Bitwise not 11111111
• Add 1 to LSB +1
• Result 1 00000000
• Overflow is ignored, so:
• -0=0
Negation Special Case 2
• -128 = 10000000
• bitwise not 01111111
• Add 1 to LSB +1
• Result 10000000
• So:
• -(-128) = -128 X
• Monitor MSB (sign bit)
• It should change during negation
Range of Numbers
• 8 bit 2s compliment
• +127 = 01111111 = 27 -1
• -128 = 10000000 = -27
• 16 bit 2s compliment
• +32767 = 011111111 11111111 = 215 - 1
• -32768 = 100000000 00000000 = -215
Conversion Between Lengths
• Positive number pack with leading zeros
• +18 = 00010010
• +18 = 00000000 00010010
• Negative numbers pack with leading ones
• -18 = 10010010
• -18 = 11111111 10010010
• i.e. pack with MSB (sign bit)
Addition and Subtraction
• Normal binary addition
• Monitor sign bit for overflow

• Take twos compliment of substahend and add to minuend


• i.e. a - b = a + (-b)

• So we only need addition and complement circuits


Hardware for Addition and Subtraction
Multiplication
• Complex
• Work out partial product for each digit
• Take care with place value (column)
• Add partial products
Multiplication Example
• 1011 Multiplicand (11 dec)
• x 1101 Multiplier (13 dec)
• 1011 Partial products
• 0000 Note: if multiplier bit is 1 copy
• 1011 multiplicand (place value)
• 1011 otherwise zero
• 10001111 Product (143 dec)
• Note: need double length result
Unsigned Binary Multiplication
Execution of Unsigned Binary Multiplication
Flowchart for Unsigned Binary
Multiplication
Multiplying Negative Numbers
• This does not work!
• Solution 1
• Convert to positive if required
• Multiply as above
• If signs were different, negate answer
• Solution 2
• Booth’s algorithm
Booth’s Algorithm
Example of Booth’s Algorithm
Division
• More complex than multiplication
• Negative numbers are really bad!
• Based on long division
Division of Unsigned Binary Integers
00001101 Quotient
Divisor 1011 10010011 Dividend
1011
001110
Partial 1011
Remainders
001111
1011
100 Remainder
Flowchart for Unsigned Binary Division
Example of Unsigned Binary Division
M=0011
Q=1010
A=N+1=5 M=00011 2’Compliment of M=11101

A Q M
00000 1010 00011 Initial Values

00001 010- 00011 Shift left

11110 010- 00011 A=A-M

100001 0100 00011 Set Qᴏ,A=A+M(here,Discard 1 bcz 2’ Compliment)

00010 100- 00011 Shift left

11111 100- 00011 A=A-M

00010 1000 00011 Set Qᴏ,A=A+M(here,Discard 1 bcz 2’ Compliment)


A Q M
00101 000- 00011 Shift left
100010 000- 00011 A=A-M (Discard 1)
00010 0001 00011 Set Qᴏ

00100 001- 00011 Shift left


100001 001- 00011 A=A-M (Discard 1)
00001 0011 00011 Set Qᴏ

Reminder=00001
Quotient=0011
Real Numbers
• Numbers with fractions
• Convert No Binary to Decimal

(1101.01101)=

1101 01101
=1*2^3+1*2^2+1*2^0 =1*2^-2+1*2^-3+1*2^-5
=8+4+1 =1/2^2+1/2^3+1/2^5
=13 =1/4+1/8+1/32
=0.25+0.125+0.03125
=0.40623
Ans-(13.40623)
Convert Decimal to Binary

• (28.125)

28=11100

(.125)
=0.125*2 =0.250
=0.250*2 =0.500
=0.500*2 =1.000
=100

Ans-(28.125)=(11100.100)
Single Floating Point

• Total 32 Bit –
• 0-22=Mantisa
• 23-30=E’=E+Bias(Bias for Single Precision=127)
• 31=Sign Bit(0-Positive,1-Negative)

่ ด,
Misnomer:เรียกชือผิ ใชช
Double Floating Point

Total 64 Bit –
0-51=Mantisa
52-62=E’=E+Bias(Bias for Single Precision=1024)
63=Sign Bit(0-Positive,1-Negative)
Signs for Floating Point
• Mantissa is stored in 2s compliment
• Exponent is in excess or biased notation
• e.g. Excess (bias) 128 means
• 8 bit exponent field
• Pure value range 0-255
• Subtract 128 to get correct value
• Range -128 to +127
Normalization
• FP numbers are usually normalized
• i.e. exponent is adjusted so that leading bit (MSB) of
mantissa is 1
• Since it is always 1 there is no need to store it
• (c.f. Scientific notation where numbers are normalized to
give a single digit before the decimal point
• e.g. 3.123 x 103)
FP Ranges
• For a 32 bit number
• 8 bit exponent
• +/- 2256  1.5 x 1077
• Accuracy
• The effect of changing lsb of mantissa
• 23 bit mantissa 2-23  1.2 x 10-7
• About 6 decimal places
IEEE 754
• Standard for floating point storage
• 32 and 64 bit standards
• 8 and 11 bit exponent respectively
• Extended formats (both mantissa and exponent) for
intermediate results
IEEE 754 Formats
Convert a number in IEEE 754 (32 bit)

Example:- 263.3
1. Convert this No in Binary no
263.3 =(100000111.0100110011001101…)

2. Represent form in Scientific Notation


1. 000001110100110011001101….*2^8

3. Dividing this in IEEE 754 Format


Sign Bit Because no is positive-0
E’=127+8=135=100011
M= 0000011100100110011001101….
Ans-01000110000011100100110011001101…..
Convert a number in IEEE 754 (64 bit)

Example:- 263.3
1. Convert this No in Binary no
263.3 =(1000001110.0100110011001101…)

2. Represent form in Scientific Notation


1. 0000011100100110011001101….*2^8

3. Dividing this in IEEE 754 Format


Sign Bit Because no is positive-0
E’=1024+8=1032=10000001000
M= 0000011100100110011001101….
Ans-0100000010000000011100100110011001101…..
Addition & Subtraction of Binary Floating Numbers

• Rules
• 1.Compare magnitude of the 2 exponant & make suitable
alignment to the number with the smaller magnitude of
exponant.
• 2.Perform Addition/Subtraction.
• 3.Perform Normalization by shifting resulting mantissa &
adjusting resulting exponant
Addition
• Add (1.1100*2^4 & 1.100*2^2)
1. 1.100*2^2 has aligned to 0.01100*2^4

2. Addition:
=1.1100*2^4+0.01100*2^4
=10.0010*2^4

3.Normalization:
Final Normalization Result is
=0.100010*2^6
=0.1000*2^6(Assuming 4 bits are allowed after radix Point)
Ans=0.1000*2^6
Subtraction
• Add (1.1100*2^4 & 1.100*2^2)
1. 1.100*2^2 has aligned to 0.01100*2^4
2. Subtraction:
=1.1100*2^4- 0.01100*2^4
=1.1100*2^4+(-0.01100*2^4)
=1.1100*2^4+1.10100*2^4
=11.01100*2^4
=1.01100*2^4 (1 reduce because 2’s Complement)
3.Normalization:
Final Normalization Result is
=0.101100*2^5
=0.1011*2^5(Assuming 4 bits are allowed after radix Point)
Ans=0.1011*2^5
FP Addition & Subtraction Flowchart
FP Arithmetic x/
• Multiplication of a pair of floating point number
• x= mx *2^a & y=my*2^b is represented as x*y=(mx*my)*2^a+b

A general algorithm on floating point methods


The following 3 basic steps there are:
a. Compare the exponant of the product by adding exponant together.
b. Multiplying 2 mantissas
c. Normalize & Round the final product
Multiplication
• Example X=1.000*2^-2 Y=-1.010*2^-1

a. Add Exponant=(-2)+(-1)=-3

b. Multiply Mantisass =1.000*-1.010


=-1.010000
c. After rounding product is (Normalization)
=-1.010000*2^-3
=-0.1010000*2^-4
=-0.1010*2^-4 (Assuming 4 bits are allowed after radix Point)

Ans=-0.1010*2^-4
Division
• Ex. X=91.34375 X=1011011.01011 X=1.01101101011*2^6
Y=0.14453125 Y=0.00100101 Y=1.00101*2^-3

a. X/Y =(Xs/Ys)*2^Xe-Ye
=(X/Y)*2^6-(-3)

b. =(X/Y)*2^9
=1.001111*2^9 // (X/Y)=1.001111

c. Normalization
=1.001111*2^9
=0.1001111*2^10
=0.1001*2^10 (Assuming 4 bits are allowed after radix Point)
Ans= 0.1001*2^10
Floating Point Multiplication
Floating Point Division
Required Reading
• Stallings Chapter 9
• IEEE 754 on IEEE Web site

You might also like