Computer Organization and Architecture | Pipelining | Set 1 (Execution, Stages and Throughput)
Last Updated :
13 Sep, 2024
Pipelining is a technique used in modern processors to improve performance by executing multiple instructions simultaneously. It breaks down the execution of instructions into several stages, where each stage completes a part of the instruction. These stages can overlap, allowing the processor to work on different instructions at various stages of completion, similar to an assembly line in manufacturing.
In this article, you will get a detailed overview of Pipeline in Computer Organization and Architecture.
What is Pipelining?
Pipelining is an arrangement of the CPU's hardware components to raise the CPU's general performance. In a pipelined processor, procedures called 'stages’ are accomplished in parallel, and the execution of more than one line of instruction occurs. Now let us look at a real-life example that should operate based on the pipelined operation concept. Consider a water bottle packaging plant. For this case, let there be 3 processes that a bottle should go through, ensing the bottle(I), Filling water in the bottle(F), Sealing the bottle(S).
It will be helpful for us to label these stages as stage 1, stage 2, and stage 3. Let each stage take 1 minute to complete its operation. Now, in a non-pipelined operation, a bottle is first inserted in the plant, and after 1 minute it is moved to stage 2 where water is filled. Now, in stage 1 nothing is happening. Likewise, when the bottle is in stage 3 both stage 1 and stage 2 are inactive. But in pipelined operation, when the bottle is in stage 2, the bottle in stage 1 can be reloaded. In the same way, during the bottle 3 there could be one bottle in the 1st and 2nd stage accordingly. Therefore at the end of stage 3, we receive a new bottle for every minute. Hence, the average time taken to manufacture 1 bottle is:
Therefore, the average time intervals of manufacturing each bottle is:
Without pipelining = 9/3 minutes = 3m
I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)
With pipelining = 5/3 minutes = 1.67m
I F S | |
| I F S |
| | I F S (5 minutes)
Thus, pipelined operation increases the efficiency of a system.
Design of a basic Pipeline
- In a pipelined processor, a pipeline has two ends, the input end and the output end. Between these ends, there are multiple stages/segments such that the output of one stage is connected to the input of the next stage and each stage performs a specific operation.
- Interface registers are used to hold the intermediate output between two stages. These interface registers are also called latch or buffer.
- All the stages in the pipeline along with the interface registers are controlled by a common clock.
Execution in a pipelined processor Execution sequence of instructions in a pipelined processor can be visualized using a space-time diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be executed. We can visualize the execution sequence through the following space-time diagrams:
Non-Overlapped Execution
Stage / Cycle | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|
S1 | I1 | | | | I2 | | | |
S2 | | I1 | | | | I2 | | |
S3 | | | I1 | | | | I2 | |
S4 | | | | I1 | | | | I2 |
Total time = 8 Cycle
Overlapped Execution
Stage / Cycle | 1 | 2 | 3 | 4 | 5 |
---|
S1 | I1 | I2 | | | |
S2 | | I1 | I2 | | |
S3 | | | I1 | I2 | |
S4 | | | | I1 | I2 |
Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set. Following are the 5 stages of the RISC pipeline with their respective operations:
- Stage 1 (Instruction Fetch): In this stage the CPU fetches the instructions from the address present in the memory location whose value is stored in the program counter.
- Stage 2 (Instruction Decode): In this stage, the instruction is decoded and register file is accessed to obtain the values of registers used in the instruction.
- Stage 3 (Instruction Execute): In this stage some of activities are done such as ALU operations.
- Stage 4 (Memory Access): In this stage, memory operands are read and written from/to the memory that is present in the instruction.
- Stage 5 (Write Back): In this stage, computed/fetched value is written back to the register present in the instructions.
Performance of a pipelined processor Consider a 'k' segment pipeline with clock cycle time as 'Tp'. Let there be 'n' tasks to be completed in the pipelined processor. Now, the first instruction is going to take 'k' cycles to come out of the pipeline but the other 'n – 1' instructions will take only '1' cycle each, i.e, a total of 'n – 1' cycles. So, time taken to execute 'n' instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
In the same case, for a non-pipelined processor, the execution time of 'n' instructions will be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over the non-pipelined processor, when 'n' tasks are executed on the same processor is:
S = Performance of non-pipelined processor /
Performance of pipelined processor
As the performance of a processor is inversely proportional to the execution time, we have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]
When the number of tasks 'n' is significantly larger than k, that is, n >> k
S = n * k / n
S = k
where 'k' are the number of stages in the pipeline. Also, Efficiency = Given speed up / Max speed up = S / Smax We know that Smax = k So, Efficiency = S / k Throughput = Number of instructions / Total time to complete the instructions So, Throughput = n / (k + n – 1) * Tp Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1 Please see Set 2 for Dependencies and Data Hazard and Set 3 for Types of pipeline and Stalling.
Performance of pipeline is measured using two main metrices as Throughput and latency.
What is Throughout?
- It measure number of instruction completed per unit time.
- It represents overall processing speed of pipeline.
- Higher throughput indicate processing speed of pipeline.
- Calculated as, throughput= number of instruction executed/ execution time.
- It can be affected by pipeline length, clock frequency. efficiency of instruction execution and presence of pipeline hazards or stalls.
What is Latenecy?
- It measure time taken for a single instruction to complete its execution.
- It represents delay or time it takes for an instruction to pass through pipeline stages.
- Lower latency indicates better performance .
- It is calculated as, Latency= Execution time/ Number of instruction executed.
- It in influenced by pipeline length, depth, clock cycle time, instruction dependencies and pipeline hazards.
Advantages of Pipelining
- Increased Throughput: Pipelining enhance the throughput capacity of a CPU and enables a number of instruction to be processed at the same time at different stages. This leads to the improvement of the amount of instructions accomplished in a given period of time, thus improving the efficiency of the processor.
- Improved CPU Utilization: From superimposing of instructions, pipelining helps to ensure that different sections of the CPU are useful. This gives no time for idling of the various segments of the pipeline and optimally utilizes hardware resources.
- Higher Instruction Throughput: Pipelining occurring because when one particular instruction is in the execution stage it is possible for other instructions to be at varying stages of fetch, decode, execute, memory access, and write-back. In this manner there is concurrent processing going on and the CPU is able to process more number of instructions in a given time frame than in non pipelined processors.
- Better Performance for Repeated Tasks: Pipelining is particularly effective when all the tasks are accompanied by repetitive instructions, because the use of the pipeline shortens the amount of time each task takes to complete.
- Scalability: Pipelining is RSVP implemented in different types of processors hence it is scalable from simple CPU’s to an advanced multi-core processor.
Disadvantages of Pipelining
- Pipeline Hazards: Pipelining may result to data hazards whereby instructions depends on other instructions; control hazards, which arise due to branch instructions; and structural hazards whereby there are inadequate hardware facilities. Some of these hazards may lead to delays hence tough strategies to manage them to ensure progress is made.
- Increased Complexity: Pipelining enhances the complexity of processor design as well as its application as compared to non-pipelined structures. Pipelining stages management, dealing with the risks and correct instruction sequence contribute to the design and control considerations.
- Stall Cycles: When risks are present, pipeline stalls or bubbles can be brought about, and this produces idle times in certain stages in the pipeline. These stalls can actually remove some of the cycles acquired by pipelining, thus reducing the latter’s efficiency.
- Instruction Latency: While pipelining increases the throughput of instructions the delay of each instruction may not necessarily be reduced. Every instruction must still go through all the pipeline stages and the time it takes for a single instruction to execute can neither reduce nor decrease significantly due to overheads.
- Hardware Overhead: It increases the complexity in designing the pipelining due to the presence of pipeline registers and the control logic used in managing the pipe stages and the data. This not only increases the cost of the wares but also forces integration of more complicated, and thus costly, hardware.
Conclusion
Pipelining is one of the most essential concepts and it improves CPU’s capability to process several instructions at the same time across various stages. It increases immensely the system’s throughput and overall efficiency by effectively determining the optimum use of hardware. On its own it enhances the processing speed but handling of pipeline hazards is critical for enhancing efficiency. It is thus crucial for any architect developing systems that will support HPC to have a war chest of efficient pipelining strategies that they can implement.
Similar Reads
Machine instructions and addressing modes
Computer Organization is like understanding the "blueprint" of how a computer works internally. One of the most important models in this field is the Von Neumann architecture, which is the foundation of most modern computers. Named after John von Neumann, this architecture introduced the concept of
6 min read
Computer organization refers to the way in which the components of a computer system are organized and interconnected to perform specific tasks. One of the most fundamental aspects of computer organization is the set of basic computer instructions that the system can execute.Basic Computer Instructi
6 min read
Instruction formats refer to the way instructions are encoded and represented in machine language. There are several types of instruction formats, including zero, one, two, and three-address instructions. Each type of instruction format has its own advantages and disadvantages in terms of code size,
11 min read
Based on the number of address fields, CPU organization is of three types: Single Accumulator organization, register based organization and stack based CPU organization.Stack-Based CPU OrganizationThe computers which use Stack-based CPU Organization are based on a data structure called a stack. The
4 min read
When we are using multiple general-purpose registers, instead of a single accumulator register, in the CPU Organization then this type of organization is known as General register-based CPU Organization. In this type of organization, the computer uses two or three address fields in their instruction
3 min read
The computers, present in the early days of computer history, had accumulator-based CPUs. In this type of CPU organization, the accumulator register is used implicitly for processing all instructions of a program and storing the results into the accumulator. The instruction format that is used by th
2 min read
Prerequisite - Basic Computer Instructions, Instruction Formats Problem statement: Consider a computer architecture where instructions are 16 bits long. The first 6 bits of the instruction are reserved for the opcode, and the remaining 10 bits are used for the operands. There are three addressing mo
7 min read
Addressing modes are the techniques used by the CPU to identify where the data needed for an operation is stored. They provide rules for interpreting or modifying the address field in an instruction before accessing the operand.Addressing modes for 8086 instructions are divided into two categories:
7 min read
Machine Instructions are commands or programs written in the machine code of a machine (computer) that it can recognize and execute. A machine instruction consists of several bytes in memory that tell the processor to perform one machine operation. The processor looks at machine instructions in main
5 min read
In assembly language as well as in low level programming CALL and JUMP are the two major control transfer instructions. Both instructions enable a program to go to different other parts of the code but both are different. CALL is mostly used to direct calls to subroutine or a function and regresses
5 min read
Simplified Instructional Computer (SIC) is a hypothetical computer that has hardware features that are often found in real machines. There are two versions of this machine: SIC standard ModelSIC/XE(extra equipment or expensive)Object programs for SIC can be properly executed on SIC/XE which is known
4 min read
Let's discuss about parallel computing and hardware architecture of parallel computing in this post. Note that there are two types of computing but we only learn parallel computing here. As we are going to learn parallel computing for that we should know following terms. Era of computing - The two f
3 min read
Parallel computing is a computing where the jobs are broken into discrete parts that can be executed concurrently. Each part is further broken down to a series of instructions. Instructions from each part execute simultaneously on different CPUs. Parallel systems deal with the simultaneous use of mu
4 min read
The generation of computers refers to the progression of computer technology over time, marked by key advancements in hardware and software. These advancements are divided into five generations, each defined by improvements in processing power, size, efficiency, and overall capabilities. Starting wi
6 min read
It is named after computer scientist Gene Amdahl( a computer architect from IBM and Amdahl corporation) and was presented at the AFIPS Spring Joint Computer Conference in 1967. It is also known as Amdahl's argument. It is a formula that gives the theoretical speedup in latency of the execution of a
6 min read
ALU, dataââ¬Âpath and control unit
Instruction pipelining
Pipelining is a technique used in modern processors to improve performance by executing multiple instructions simultaneously. It breaks down the execution of instructions into several stages, where each stage completes a part of the instruction. These stages can overlap, allowing the processor to wo
9 min read
Please see Set 1 for Execution, Stages and Performance (Throughput) and Set 3 for Types of Pipeline and Stalling. Dependencies in a pipelined processor There are mainly three types of dependencies possible in a pipelined processor. These are : 1) Structural Dependency 2) Control Dependency 3) Data D
6 min read
Please see Set 1 for Execution, Stages and Performance (Throughput) and Set 2 for Dependencies and Data Hazard. Types of pipeline Uniform delay pipeline In this type of pipeline, all the stages will take same time to complete an operation. In uniform delay pipeline, Cycle Time (Tp) = Stage Delay If
3 min read
Introduction : Prerequisite - Execution, Stages and Throughput Registers Involved In Each Instruction Cycle: Memory address registers(MAR) : It is connected to the address lines of the system bus. It specifies the address in memory for a read or write operation.Memory Buffer Register(MBR) : It is co
11 min read
In computer organization, performance refers to the speed and efficiency at which a computer system can execute tasks and process data. A high-performing computer system is one that can perform tasks quickly and efficiently while minimizing the amount of time and resources required to complete these
5 min read
In computer organization, a micro-operation refers to the smallest tasks performed by the CPU's control unit. These micro-operations helps to execute complex instructions. They involve simple tasks like moving data between registers, performing arithmetic calculations, or executing logic operations.
3 min read
RISC is the way to make hardware simpler whereas CISC is the single instruction that handles multiple work. In this article, we are going to discuss RISC and CISC in detail as well as the Difference between RISC and CISC, Let's proceed with RISC first. Reduced Instruction Set Architecture (RISC) The
5 min read
Cache Memory
In the Computer System Design, Memory Hierarchy is an enhancement to organize the memory such that it can minimize the access time. The Memory Hierarchy was developed based on a program behavior known as locality of references (same data or nearby data is likely to be accessed again and again). The
6 min read
Cache memory is a small, high-speed storage area in a computer. The cache is a smaller and faster memory that stores copies of the data from frequently used main memory locations. There are various independent caches in a CPU, which store instructions and data. The most important use of cache memory
11 min read
Cache is close to CPU and faster than main memory. But at the same time is smaller than main memory. The cache organization is about mapping data in memory to a location in cache. A Simple Solution: One way to go about this mapping is to consider last few bits of long memory address to find small ca
3 min read
Caches are the faster memories that are built to deal with the Processor-Memory gap in data read operation, i.e. the time difference in a data read operation in a CPU register and that in the main memory. Data read operation in registers is generally 100 times faster than in the main memory and it k
7 min read
The CPU Cache and Translation Lookaside Buffer (TLB) are two important microprocessor hardware components that improve system performance, although they have distinct functions. Even though some people may refer to TLB as a kind of cache, it's important to recognize the different functions they serv
4 min read
A memory unit stores binary information in groups of bits called words. Data input lines provide the information to be stored into the memory, Data output lines carry the information out from the memory. The control lines Read and write specifies the direction of transfer of data. Basically, in the
3 min read
Prerequisite - Virtual Memory Abstraction is one of the most important aspects of computing. It is a widely implemented Practice in the Computational field. Memory Interleaving is less or More an Abstraction technique. Though it's a bit different from Abstraction. It is a Technique that divides memo
3 min read
Memory is required to save data and instructions. Memory is divided into cells, and they are stored in the storage space present in the computer. Every cell has its unique location/address. Memory is very essential for a computer as this is the way it becomes somewhat more similar to a human brain.
11 min read
Memory is a fundamental component of computing systems, essential for performing various tasks efficiently. It plays a crucial role in how computers operate, influencing speed, performance, and data management. In the realm of computer memory, two primary types stand out: Random Access Memory (RAM)
8 min read
In the computer world, memory plays an important component in determining the performance and efficiency of a system. In between various types of memory, Random Access Memory (RAM) stands out as a necessary component that enables computers to process and store data temporarily. In this article, we w
8 min read
Memory is an important part of the Computer which is responsible for storing data and information on a temporary or permanent basis. Memory can be classified into two broad categories: Primary Memory Secondary Memory What is Primary Memory? Primary Memory is a type of Computer Memory that the Prepro
7 min read
I/O interface (Interrupt and DMA mode)
The method that is used to transfer information between internal storage and external I/O devices is known as I/O interface. The CPU is interfaced using special communication links by the peripherals connected to any computer system. These communication links are used to resolve the differences betw
6 min read
The DMA mode of data transfer reduces the CPU's overhead when handling I/O operations. It also allows parallel processing between CPU and I/O operations. This parallelism is necessary to avoid the wastage of valuable CPU time when handling I/O devices whose speeds are much slower as compared to CPU.
5 min read
The kernel provides many services related to I/O. Several services such as scheduling, caching, spooling, device reservation, and error handling - are provided by the kernel's I/O subsystem built on the hardware and device-driver infrastructure. The I/O subsystem is also responsible for protecting i
7 min read
CPU needs to communicate with the various memory and input-output devices (I/O). Data between the processor and these devices flow with the help of the system bus. There are three ways in which system bus can be allotted to them:Separate set of address, control and data bus to I/O and memory.Have co
5 min read
Introduction : In a computer system, multiple devices, such as the CPU, memory, and I/O controllers, are connected to a common communication pathway, known as a bus. In order to transfer data between these devices, they need to have access to the bus. Bus arbitration is the process of resolving conf
7 min read
In I/O Interface (Interrupt and DMA Mode), we have discussed the concept behind the Interrupt-initiated I/O. To summarize, when I/O devices are ready for I/O transfer, they generate an interrupt request signal to the computer. The CPU receives this signal, suspends the current instructions it is exe
5 min read
Introduction : Asynchronous input/output (I/O) synchronization is a technique used in computer organization to manage the transfer of data between the central processing unit (CPU) and external devices. In asynchronous I/O synchronization, data transfer occurs at an unpredictable rate, with no fixed
7 min read
A port is basically a physical docking point which is basically used to connect the external devices to the computer, or we can say that A port act as an interface between the computer and the external devices, e.g., we can connect hard drives, printers to the computer with the help of ports. Featur
3 min read
A cluster is a set of loosely or tightly connected computers working together as a unified computing resource that can create the illusion of being one machine. Computer clusters have each node set to perform the same task, controlled and produced by the software. Clustered Operating Systems work si
7 min read
Introduction - The advent of a technological marvel called the âcomputerâ has revolutionized life in the twenty-first century. From IoT to self-driving cars to smart cities, computers have percolated through the fabric of society. Unsurprisingly the methods with which we interact with computers have
4 min read