Computer Architecture & Organization Unit 2
Computer Architecture & Organization Unit 2
There are various types of number representation techniques for digital number representation,
for example: Binary number system, octal number system, decimal number system, and
hexadecimal number system etc. But Binary number system is most relevant and popular for
representing numbers in digital computer system.
There are two major approaches to store real numbers (i.e., numbers with fractional
component) in modern computing. These are (i) Fixed Point Notation and (ii) Floating Point
Notation. In fixed point notation, there are a fixed number of digits after the decimal point,
whereas floating point number allows for a varying number of digits after the decimal
point.
Fixed-Point Representation −
This representation has fixed number of bits for integer part and for fractional part. For
example, if given fixed-point representation is IIII.FFFF, then you can store minimum value is
0000.0001 and maximum value is 9999.9999. There are three parts of a fixed-point number
representation: the sign field, integer field, and fractional field.
Example −Assume number is using 32-bit format which reserve 1 bit for the sign, 15 bits for
the integer part and 16 bits for the fractional part.
These are above smallest positive number and largest positive number which can be store in
32-bit representation as given above format. Therefore, the smallest positive number is 2-16
≈ 0.000015 approximate and the largest positive number is (215-1)+(1-2-16)=215(1-2-16)
=32768, and gap between these numbers is 2-16.
We can move the radix point either left or right with the help of only integer field is 1.
The floating number representation of a number has two part: the first part represents a
signed fixed point number called mantissa. The second part of designates the position of the
decimal (or binary) point and is called the exponent. The fixed point mantissa may be
fraction or an integer. Floating -point is always interpreted to represent a number in the
following form: Mxre.
Only the mantissa m and the exponent e are physically represented in the register (including
their sign). A floating-point binary number is represented in a similar manner except that is
uses base 2 for the exponent. A floating-point number is said to be normalized if the most
significant digit of the mantissa is 1.
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number.
Note that signed integers and exponent are represented by either sign representation, or
one’s complement representation, or two’s complement representation.
The floating point representation is more flexible. Any non-zero number can be represented in
the normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x.
Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed exponent,
and 23 bits for the fractional part. The leading bit 1 is not stored (as it is always 1 for a
normalized number) and is referred to as a “hidden bit”.
The precision of a floating-point format is the number of positions reserved for binary digits
plus one (for the hidden bit). In the examples considered here the precision is 23+1=24.
The gap between 1 and the next normalized floating-point number is known as machine
epsilon. the gap is (1+2-23)-1=2-23for above example, but this is same as the smallest
positive floating-point number because of non-uniform spacing unlike in the fixed-point
scenario.
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number. The sign bit is 0 for positive number and 1 for
negative number. Exponents are represented by or two’s complement representation.
According to IEEE 754 standard, the floating-point number is represented in following ways:
1) Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit mantissa
2) Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit mantissa
3) Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit mantissa
4) Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112 bit mantissa
By: Namrata Singh
Special Value Representation −
There are some special values depended upon different values of the exponent and
mantissa in the IEEE 754 standard.
1) All the exponent bits 0 with all mantissa bits 0 represents 0. If sign bit is 0, then
+0, else -0.
2) All the exponent bits 1 with all mantissa bits 0 represents infinity. If sign bit is 0,
then +∞, else -∞.
3) All the exponent bits 0 and mantissa bits non-zero represents denormalized
number.
4) All the exponent bits 1 and mantissa bits non-zero represents error.
0 0 0 0
0 1 0 1( X Y)
1 0 0 1 (X Y )
1 1 1(XY) 0
The minterms for SUM and CARRY are shown in the bracket.
The Sum-Of-Product (SOP) equation for SUM is :
S = XY + XY = X Y …..………… ( 1 )
C = XY …………. ………………………( 2 )
FULL-ADDER
Full- Adder is a logic circuit to add three binary bits. Its outputs
are SUM and CARRY. In the following truth table X, Y, Z are
inputs and C and S are CARRY and SUM.
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1(XYZ) 1(XYZ)
= X YZ YZ X YZ YZ
= XS XS ……………………. (3)
= X X YZ X YZ YZ
= YZ + XS
= C + XS ................................. (4)
HALF-SUBTRACTOR
0 0 0 0
0 1 1 XY
1 0 0 1 XY
1 1 0 0
D = X Y + XY
= X Y ................................. (5)
B = X Y ................................. (6)
FULL-SUBTRACTOR
0 0 0 0 0
0 0 1 1 XYZ
0 1 0 1 XYZ
0 1 1 1 XYZ
1 0 0 0 1 XYZ
1 0 1 0 0
1 1 0 0 0
1 1 1 1(XYZ) 1 (X YZ )
= XY XY Z XY XY Z
= DZ DZ ................................. (7)
And SOP equation for BORROW is :
= XY XY Z XY Z Z
= DZ XZ ................................. (7)
In ripple carry adders, for each adder block, the two bits that are to be added are available instantly. However,
each adder block waits for the carry to arrive from its previous block. So, it is not possible to generate the sum
and carry of any block until the input carry is known.
The ith block waits for the i- 1th block to produce its carry. So there will be a considerable time delay which is
carry propagation delay.
Consider the above 4-bit ripple carry adder. The sum S4 is produced by the corresponding full adder as
soon as the input signals are applied to it. But the carry input C4 is not available on its final steady state
value until carry C3 is available at its steady state value. Similarly C3 depends on C2 and C2 on C1.
Therefore, though the carry must propagate to all the stages in order that output S3 and carry C4 settle
their final steady-state value. The propagation time is equal to the propagation delay of each adder
block, multiplied by the number of adder blocks in the circuit. For example, if each full adder stage has
a propagation delay of 20 nanoseconds, then S3 will reach its final correct value after 60 (20 × 3)
nanoseconds. The situation gets worse, if we extend the number of stages for adding more number of
bits.
Figure - Design.
The hardware consists of two registers A and B to store the magnitudes, and two flip-
flops As and Bs to store the corresponding signs. The results can be stored in the register A
and As which acts as an accumulator. The subtraction is performed by adding A to the 2’s
complement of B. The output carry is transferred to the flip-flop E. The overflow may occur
during the add operation which is stored in the flip-flop A Ë… F. When m = 0, the output of E is
transferred to the adder without any change along with the input carry of ‘0".
The output of the parallel adder is equal to A + B which is an add operation. When m =
1, the content of register B is complemented and transferred to parallel adder along with the
input carry of 1. Therefore, the output of parallel is equal to A + B’ + 1 = A – B which is a
subtract operation.
As and Bs are compared by an exclusive-OR gate. If output=0, signs are identical, if 1 signs are
different.
For Add operation, identical signs dictate addition of magnitudes and for operation
identical signs dictate addition of magnitudes and for subtraction, different magnitudes
dictate magnitudes be added. Magnitudes are added with a micro operation EA
Two magnitudes are subtracted if signs are different for add operation and identical for
subtract operation. Magnitudes are subtracted with a micro operation EA = B and number
(this number is checked again for 0 to make positive 0 [As=0]) in A is correct result. E = 0
indicates A < B, so we take 2’s complement of A.
Multiplication
Hardware Implementation and Algorithm
Generally, the multiplication of two final point binary number in signed magnitude
representation is performed by a process of successive shift and ADD operation. The process
consists of looking at the successive bits of the multiplier (least significant bit first). If the
multiplier is 1, then the multiplicand is copied down otherwise, 0’s are copied. The numbers
Division Algorithm
The division of two fixed point signed numbers can be done by a process of successive
compare shift and subtraction. When it is implemented in digital computers, instead of
shifting the divisor to the right, the dividend or the partial remainder is shifted to the left. The
subtraction can be obtained by adding the number A to the 2’s complement of number B. The
information about the relative magnitudes of the information about the relative magnitudes
of numbers can be obtained from the end carry,
Hardware Implementation
The hardware implementation for the division signed numbers is shown id the figure.
2.9400 x as 0.0294 x and then perform addition of the mantissas to get 4.3394
Add/Subtract Rule
The steps in addition (FA) or subtraction (FS) of floating-point numbers (s1, eˆ , f1) fad{s2, eˆ
2, f2) are as follows.
1. Unpack sign, exponent, and fraction fields. Handle special operands such as zero,
infinity, or NaN(not a number).
2. Shift the significand of the number with the smaller exponent right by bits.
3. Set the result exponent er to max(e1,e2).
4. If the instruction is FA and s1= s2 or if the instruction is FS and s1 ≠ s2 then add the
significands; otherwise subtract them.
5. Count the number z of leading zeros. A carry can make z = -1. Shift the result
significand left z bits or right 1 bit if z = -1.
6. Round the result significand, and shift right and adjust z if there is rounding overflow,
which is a carry-out of the leftmost digit upon rounding.
7. Adjust the result exponent by er = er - z, check for overflow or underflow, and pack
the result sign, biased exponent, and fraction bits into the result word.
Multiplication and division are somewhat easier than addition and subtraction, in that
no alignment of mantissas is needed.