Computer Arithmetic | Set - 2
Last Updated :
19 Apr, 2023
FLOATING POINT ADDITION AND SUBTRACTION
To understand floating point addition, first we see addition of real numbers in decimal as same logic is applied in both cases.
For example, we have to add 1.1 * 103 and 50.
We cannot add these numbers directly. First, we need to align the exponent and then, we can add significant.
After aligning exponent, we get 50 = 0.05 * 103
Now adding significant, 0.05 + 1.1 = 1.15
So, finally we get (1.1 * 103 + 50) = 1.15 * 103
Here, notice that we shifted 50 and made it 0.05 to add these numbers.
Now let us take example of floating point number addition
We follow these steps to add two numbers:
1. Align the significant
2. Add the significant
3. Normalize the result
Let the two numbers be
x = 9.75
y = 0.5625
Converting them into 32-bit floating point representation,
9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
0.5625’s representation in 32-bit format = 0 01111110 00100000000000000000000
Now we get the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10
Now, we shift the mantissa of lesser number right side by 4 units.
Mantissa of 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, we get 0.00010010000000000000000
Mantissa of 9.75 = 1. 00111000000000000000000
Adding mantissa of both
0. 00010010000000000000000
+ 1. 00111000000000000000000
-------------------------------------------------
1. 01001010000000000000000
In final answer, we take exponent of bigger number
So, final answer consist of :
Sign bit = 0
Exponent of bigger number = 10000010
Mantissa = 01001010000000000000000
32 bit representation of answer = x + y = 0 10000010 01001010000000000000000
- FLOATING POINT SUBTRACTION
Subtraction is similar to addition with some differences like we subtract mantissa unlike addition and in sign bit we put the sign of greater number.
Let the two numbers be
x = 9.75
y = - 0.5625
Converting them into 32-bit floating point representation
9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
- 0.5625’s representation in 32-bit format = 1 01111110 00100000000000000000000
Now, we find the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10
Now, we shift the mantissa of lesser number right side by 4 units.
Mantissa of – 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, 0.00010010000000000000000
Mantissa of 9.75= 1. 00111000000000000000000
Subtracting mantissa of both
0. 00010010000000000000000
- 1. 00111000000000000000000
------------------------------------------------
1. 00100110000000000000000
Sign bit of bigger number = 0
So, finally the answer = x – y = 0 10000010 00100110000000000000000

This article has been contributed by Anuj Batham.
In this continuation of the previous set, we will cover some more concepts and operations involved in computer arithmetic:
- Signed vs. unsigned numbers: In computer arithmetic, numbers can be either signed or unsigned. Unsigned numbers only represent positive values, while signed numbers can represent both positive and negative values. Signed numbers are typically represented using two's complement notation.
- Shift operations: Shift operations are used to move the bits of a number left or right. A left shift multiplies the number by 2 raised to the power of the number of shifted bits, while a right shift divides the number by 2 raised to the power of the number of shifted bits.
- Logical vs. arithmetic shifts: Shift operations can be either logical or arithmetic. Logical shifts insert zeros into the vacant bit positions, while arithmetic shifts preserve the sign bit when shifting a signed number.
- Carry and borrow: In addition and subtraction operations, carry and borrow refer to the bit that is carried over or borrowed from the next digit position when the result of an operation exceeds the number of bits available.
- Boolean algebra: Boolean algebra is a branch of mathematics that deals with logical operations on binary variables. It is widely used in computer arithmetic for operations such as bitwise AND, OR, and XOR.
- Fixed-point arithmetic: Fixed-point arithmetic is used to perform arithmetic operations on numbers with a fixed number of decimal places. It is commonly used in applications such as digital signal processing.
- Division algorithms: There are various algorithms used to perform division in computer arithmetic, including the restoring division algorithm, non-restoring division algorithm, and SRT division algorithm.
- Multiplication algorithms: There are also various algorithms used to perform multiplication in computer arithmetic, including the Booth's algorithm, the array multiplier, and the Wallace tree multiplier.
Overall, computer arithmetic is a complex and important field that underlies many aspects of modern computing. It involves a wide range of concepts and operations, from basic addition and subtraction to advanced algorithms for multiplication and division.
Similar Reads
Digital Electronics and Logic Design Tutorials Digital Electronics and Logic Design are key concepts in both electronics and computer science. Digital systems are at the core of everything from basic devices like calculators to advanced computing systems. Digital systems use binary numbers (0s and 1s) to represent and process information.Logic g
4 min read
Number Systems
Boolean Algebra and Logic Gates
Logic Gates - Definition, Types, UsesLogic Gates are the fundamental building blocks in digital electronics. Used to perform logical operations based on the inputs provided to it and gives a logical output that can be either high(1) or low(0). The operation of logic gates is based on Boolean algebra or mathematics.There are basically s
10 min read
Basic Conversion of Logic GatesIn the Digital System, logic gates are the basic building blocks. Â In these logic gates, we can find the gates having more than one input, but will have only one output. The connection between the input and the output of a gate is based on some logic. Based on this logic, different gates are develop
6 min read
Realization of Logic Gate Using Universal gatesIn Boolean Algebra, the NAND and NOR gates are called universal gates because any digital circuit can be implemented by using any one of these two i.e. any logic gate can be created using NAND or NOR gates only.Implementation of AND Gate using Universal GatesImplementation using NAND GatesThe AND ga
6 min read
Canonical and Standard FormCanonical Form - In Boolean algebra, the Boolean function can be expressed as Canonical Disjunctive Normal Form known as minterm and some are expressed as Canonical Conjunctive Normal Form known as maxterm. In Minterm, we look for the functions where the output results in "1" while in Maxterm we loo
6 min read
Types of Integrated CircuitsIn this article, we will go through the Types of Integrated Circuits, we will start our article with the introductions of the ICs, then we will go through different types of ICs one by one, At last, we will conclude our article will their applications, advantages, disadvantages and some FAQs. Table
7 min read
Minimization Techniques
Minimization of Boolean FunctionsBoolean functions are used to represent logical expressions in terms of sum of minterms or product of maxterms. Number of these literals (minterms or maxterms) increases as the complexity of the digital circuit increases. This can lead to large and inefficient circuits. By minimizing Boolean functio
4 min read
Introduction of K-Map (Karnaugh Map)In many digital circuits and practical problems, we need to find expressions with minimum variables. We can minimize Boolean expressions of 3, 4 variables very easily using K-map without using any Boolean algebra theorems. It is a tool which is used in digital logic to simplify boolean expression. I
5 min read
5 variable K-Map in Digital LogicPrerequisite - Implicant in K-Map Karnaugh Map or K-Map is an alternative way to write a truth table and is used for the simplification of Boolean Expressions. So far we are familiar with 3 variable K-Map & 4 variable K-Map. Now, let us discuss the 5-variable K-Map in detail. Any Boolean Express
5 min read
Various Implicants in K-MapAn implicant can be defined as a product/minterm term in Sum of Products (SOP) or sum/maxterm term in Product of Sums (POS) of a Boolean function. For example, consider a Boolean function, F = AB + ABC + BC. Implicants are AB, ABC, and BC. There are various implicant in K-Map listed below :Prime Imp
5 min read
Don't Care (X) Conditions in K-MapsOne of the most important concepts in simplifying output expressions using Karnaugh Maps (K-Maps) is the 'Don't Care' condition. The 'Don't Care' conditions allow us to treat certain cells in a K-Map as either 0, 1, or to ignore them altogether, which can help in forming larger and more efficient gr
4 min read
Quine McCluskey MethodThe Quine McCluskey method also called the tabulation method is a very useful and convenient method for simplification of the Boolean functions for a large number of variables (greater than 4). This method is useful over K-map when the number of variables is larger for which K-map formation is diffi
8 min read
Two Level Implementation of Logic GatesThe term "two-level logic" refers to a logic design that uses no more than two logic gates between input and output. This does not mean that the entire design will only have two logic gates, but it does mean that the single path from input to output will only have two logic gates.In two-level logic,
9 min read
Combinational Circuits
Half Adder in Digital LogicA half adder is a combinational logic circuit that performs binary addition of two single-bit inputs, A and B, producing two outputs: SUM and CARRY. The SUM output which is the least significant bit (LSB) is obtained using an XOR gate while the CARRY output which is the most significant bit (MSB) is
3 min read
Full Adder in Digital LogicFull Adder is a combinational circuit that adds three inputs and produces two outputs. The first two inputs are A and B and the third input is an input carry as C-IN. The output carry is designated as C-OUT and the normal output is designated as S which is SUM. The C-OUT is also known as the majorit
5 min read
Half Subtractor in Digital LogicA half subtractor is a digital logic circuit that performs the binary subtraction of two single-bit binary numbers. It has two inputs, A and B, and two outputs, Difference and Borrow. The Difference output represents the result of subtracting B from A, while the Borrow output indicates whether a bor
4 min read
Full Subtractor in Digital LogicA Full Subtractor is a combinational circuit used to perform binary subtraction. It has three inputs:A (Minuend)B (Subtrahend)B-IN (Borrow-in from the previous stage)It produces two outputs:Difference (D): The result of the subtraction.Borrow-out (B-OUT): Indicates if a borrow is needed for the next
3 min read
Parallel Adder and Parallel SubtractorAn adder adds two binary numbers one bit at a time using carry from each step. A subtractor subtracts one binary number from another using borrow when needed. A parallel adder adds all bits at once, making addition faster. Similarly, a parallel subtractor subtracts all bits at the same time for quic
5 min read
Sequential Binary MultiplierIn this article, we are going to learn how a sequential binary multiplier works with examples. So for that, we also need to learn a few concepts related to the sequential circuit, binary multipliers, etc. Finally solving the examples using a sequential binary multiplier method.Sequential CircuitA se
12 min read
Multiplexers in Digital LogicIn this article we will go through the multiplexer, we will first define what is a multiplexer then we will go through its types which are 2x1 and 4x1, then we will go through the Implementation of the 2x1 mux and higher mux with lower order mux, at last we will conclude our article with some applic
10 min read
Event Demultiplexer in Node.jsNode.js is designed to handle multiple tasks efficiently using asynchronous, non-blocking I/O operations. But how does it manage multiple operations without slowing down or blocking execution? The answer lies in the Event Demultiplexer.The Event Demultiplexer is a key component of Node.js's event-dr
3 min read
Binary Decoder in Digital LogicA binary decoder is a digital circuit used to convert binary-coded inputs into a unique set of outputs. It does the opposite of what an encoder does. A decoder takes a binary value (such as 0010) and activates exactly one output line corresponding to that value while all other output lines remain in
5 min read
Encoder in Digital LogicAn encoder is a digital circuit that converts a set of binary inputs into a unique binary code. The binary code represents the position of the input and is used to identify the specific input that is active. Encoders are commonly used in digital systems to convert a parallel set of inputs into a ser
7 min read
Code Converters - Binary to/from Gray CodeIn this article, we will go through Code Converters - Binary to/from Gray Code, we will start our article by defining Code converters, Binary code and Gray code, and then we will go through the conversion of binary code to gray code and vice versa.Table Of ContentCode ConvertersBinary CodeGray CodeC
5 min read
Magnitude Comparator in Digital LogicA magnitude digital Comparator is a combinational circuit that compares two digital or binary numbers in order to find out whether one binary number is equal, less than, or greater than the other binary number. We logically design a circuit for which we will have two inputs one for A and the other f
7 min read
Sequential Circuits
Introduction of Sequential CircuitsSequential circuits are digital circuits that store and use the previous state information to determine their next state. Unlike combinational circuits, which only depend on the current input values to produce outputs, sequential circuits depend on both the current inputs and the previous state stor
7 min read
Difference between Combinational and Sequential CircuitIn digital electronics, circuits are classified into two primary categories: The combinational circuits and the sequential circuits. Where the outputs depend on the current inputs are called combination circuit, combinational circuits are simple and effective for functions like addition, subtraction
4 min read
Latches in Digital LogicLatch is a digital circuit which converts its output according to its inputs instantly. To implement latches, we use different logic gates. In this article, we will see the definition of latches, latch types like SR, gated SR, D, gated D, JK and T with its truth table and diagrams and advantages and
7 min read
Flip-Flop types, their Conversion and ApplicationsIn this article, we will go through the Flip-Flop types, their Conversion and their Applications, First, we will go through the definition of the flip-flop with its types in brief, and then we will go through the conversion of the flip-flop with its applications, At last, we will conclude our articl
7 min read
Conversion of Flip-Flop
Register, Counter, and Memory Unit
Counters in Digital LogicA Counter is a device which stores (and sometimes displays) the number of times a particular event or process has occurred, often in relationship to a clock signal. Counters are used in digital electronics for counting purpose, they can count specific event happening in the circuit. For example, in
4 min read
Ripple Counter in Digital LogicCounters play a crucial role in digital logic circuits, enabling tasks such as clock frequency division and sequencing. This article explores the concept of ripple counters, a type of asynchronous counter, their operation, advantages, and disadvantages in digital logic design. What is a Counter?Coun
5 min read
Ring Counter in Digital LogicA ring counter is a typical application of the Shift register. The ring counter is almost the same as the shift counter. The only change is that the output of the last flip-flop is connected to the input of the first flip-flop in the case of the ring counter but in the case of the shift register it
7 min read
General Purpose RegistersA register is a collection of flip-flops. Single bit digital data is stored using flip-flops. By combining many flip-flops, the storage capacity can be extended to accommodate a huge number of bits. We must utilize an n-bit register with n flip flops if we wish to store an n-bit word.General Purpose
8 min read
Shift Registers in Digital LogicPre-Requisite: Flip-FlopsFlip flops can be used to store a single bit of binary data (1 or 0). However, in order to store multiple bits of data, we need multiple flip-flops. N flip flops are to be connected in order to store n bits of data. A Register is a device that is used to store such informati
8 min read
Computer MemoryMemory is the electronic storage space where a computer keeps the instructions and data it needs to access quickly. It's the place where information is stored for immediate use. Memory is an important component of a computer, as without it, the system wouldnât operate correctly. The computerâs opera
9 min read
Random Access Memory (RAM)Random Access Memory (RAM) is a type of computer memory that stores data temporarily. When you turn off your computer, the data in RAM disappears, unlike the data on your hard drive, which stays saved. RAM helps your computer run programs and process information faster. This is similar to how the br
11 min read
Read Only Memory (ROM)Memory plays a crucial role in how devices operate, and one of the most important types is Read-Only Memory (ROM). Unlike RAM (Random Access Memory), which loses its data when the power is turned off, ROM is designed to store essential information permanently.Here, weâll explore what ROM is, how it
8 min read
LMNs and GATE PYQs
Practice Questions - Digital Logic & Design