Unit 5
Unit 5
AND ARCHITECTURE
• The total time required to execute the program is elapsed time is a measure of
the performance of the entire computer system. It is affected by the speed of the
processor, the disk and the printer. The time needed to execute a instruction is
called the processor time.
09/08/2024 2
• The elapsed time for the execution of a program depends on all units in a
computer system, the processor time depends on the hardware involved in the
execution of individual machine instructions. This hardware comprises of the
processor and the memory which are usually connected by the bus.
• Let us examine the flow of program instructions and data between the memory
and the processor. At the start of execution, all program instructions and the
required data are stored in the main memory. As the execution proceeds,
instructions are fetched one by one over the bus into the processor, and a copy is
placed in the cache later if the same instruction or data item is needed a second
time, it is read directly from the cache.
09/08/2024 3
• The processor and relatively small cache memory can be fabricated on a single IC
chip. The internal speed of performing the basic steps of instruction processing
on chip is very high and is considerably faster than the speed at which the
instruction and data can be fetched from the main memory. A program will be
executed faster if the movement of instructions and data between the main
memory and the processor is minimized, which is achieved by using the cache.
09/08/2024 4
• Processor Clock
Processor circuits are controlled by a timing signal called clock. The clock designer the
regular time intervals called clock cycles. To execute a machine instruction the
processor divides the action to be performed into a sequence of basic steps that each
step can be completed in one clock cycle. The length of one clock cycle is an important
parameter that affects the processor performance.
• Clock rate
The rate at which a processor completes its total processing cycle in one second.
Generally, it is said that the higher the clock speed, the faster the CPU. But this
may not be the only reason for a faster CPU. There are many factors behind it like
the number of processors, speed of RAM, bus speed, size of cache etc. Some
instructions require more cycles from the CPU to be completed. Depending upon
the architecture of the CPU the clock speed can be more or less important.
09/08/2024 6
MIPS- Million instruction per second
A measure of the execution speed of the computer. The measure approximately
provides the number of machine instructions that could be executed in a second by
a computer.
09/08/2024 7
Parallel processing and Pipelining
• A large class of techniques that provide simultaneous data-processing tasks for
increasing the computational speed of a computer
• Each segment in a pipeline performs partial processing and the result obtained
from one segment is transferred to the next segment
• The final result is achieved after the data has passed through all the segments
09/08/2024 8
• Simplest example of a pipeline can be the use of an input register and digital
combinational circuit in each segment. The register holds the data and digital
circuit performs the sub-operation. The output of the digital circuit is then fed to
data register of the next segment.
09/08/2024 9
09/08/2024 10
Space time diagram
Used to illustrate the behaviour of a pipeline. Indicating the segment utilization as
a function of time.
11
09/08/2024
Assume a k-segment pipeline that takes clock cycle time Tp to execute n tasks.
Time required by task T1 to be completely executed is kTp
Remaining (n-1) tasks will be completed after time (n-1)Tp.
Total no. of clock cycles required = k+(n-1)
12
09/08/2024
Each operand needs to pass through all four segments in a fixed sequence.
13
09/08/2024
Arithmetic Pipeline
• Pipelined arithmetic units are found in high speed computers.
• Used to implement floating point operations, multiplication of fixed point
numbers or scientific problems etc.
• e.g. two floating point numbers and need to be added
The pipeline sub-operations can be broken down as:
Compare the exponents
Align the mantissa
Add the mantissa
Normalize the result
14
09/08/2024
Instruction Pipeline
• An instruction pipeline reads consecutive instructions from memory while
previous instructions are being executed in other segments.
This causes the instruction fetch and execute phases to overlap and perform
simultaneous operations
• Consider using a two-segment pipeline with instruction fetch and execution units.
The fetch segment can be implemented using a FIFO queue. Whenever execution
unit is not using the memory, the control increments the PC and uses its address
to fetch the consecutive instructions from memory and stores these instructions
into the queue
16
09/08/2024
In most general case, steps needed to process each instruction are:
Fetch the instruction from memory
Decode the instruction
Calculate the effective address
Fetch operands from memory
Execute the instruction
Store the results
17
09/08/2024
There are certain difficulties that prevent instruction pipeline from operating at its
maximum rate
Different segments take different times to operate on incoming information
Some segments get skipped for certain operations
Two or more segments require memory access at the same time causing one
segment to go into wait state
18
09/08/2024
As an example take a 4-sement pipeline for instruction execution
19
09/08/2024
The four segments of the instruction pipeline can be
FI: Segment that fetches an instruction
DA: Segment that decodes instruction and calculates effective address
FO: Segment that fetches the operand
EX: Segment that executes the instruction
20
09/08/2024
The major difficulties that cause instruction pipeline to deviate from normal
operation are:
1) Resource conflict: Access to a memory location is made by two segments at the
same time.
2) Data dependency: An instruction depends on the result of previous instruction
but result is not available yet.
3) Branch difficulties: Arise when branching and other instructions change the
value of PC
21
09/08/2024
Vector Processing
• Utilized in science and engineering problems where vast number of calculations
are required which might take days or weeks to complete.
22
09/08/2024
• In scientific problems, the data is usually formulated as vectors and matrices of
floating point numbers.
• To access each element in these vectors, program loops are introduced
• Vector instruction includes the initial address of the operands, length of vectors
and operation to be performed all in one instruction
23
09/08/2024
Pipeline Hazards
• Pipeline hazards are situations that prevent the next instruction in the instruction
stream from executing during its designated clock cycles.
• Any condition that causes a stall in the pipeline operations can be called a hazard.
• There are primarily three types of hazards:
Data Hazards
Control Hazards or instruction Hazards
Structural Hazards
24
09/08/2024
• Data Hazard
Any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. As a result of
which some operation has to be delayed and the pipeline stalls. Whenever there
are two instructions one of which depends on the data obtained from the other.
• Structural Hazard
This situation arises mainly when two instructions require a given hardware
resource at the same time and hence for one of the instructions the pipeline needs
to be stalled.
25
09/08/2024
• Control Hazard
The instruction fetch unit of the CPU is responsible for providing a stream of
instructions to the execution unit. The instructions fetched by the fetch unit are in
consecutive memory locations and they are executed. However the problem arises
when one of the instructions is a branching instruction to some other memory
location. Thus all the instruction fetched in the pipeline from consecutive memory
locations are invalid now and need to removed. This induces a stall till new
instructions are again fetched from the memory address specified in the branch
instruction.
26
09/08/2024
Multi-processors
• A multi-processor system is an interconnection of 2 or more CPUs with memory
and I/O devices
09/08/2024 27
• A multi-processor system with common shared memory is called shared-memory
or tightly-coupled multi-processor.
• Alternative of the above system is called distributed-memory or loosely-coupled
system wherein each processor element has its own private local memory. The
processors are tied together by switching scheme designed to route information
from 1 processor to another through message-passing scheme.
09/08/2024 28
• Multi-processing improves the reliability of a system so that failure in one part
has limited effect on rest of the system.
09/08/2024 29
Interconnection structures
• Physical forms available for establishing an interconnection network between
various components of the computer system.
09/08/2024 30
• Only one processor can communicate with memory or another processor at any
given time. Transfer operations are conducted by the processor that is in control
of the bus at the time.
• Any other processor wishing to initiate a transfer must first determine the
availability status of the bus and when the bus becomes available, the processor
can address the destination unit to initiate transfer.
• A single bus system is restricted to one transfer at a time i.e. when one processor
is communicating with the memory all other processors are idle waiting for the
bus.
09/08/2024 31
09/08/2024 32
Multi-port memory
A multiport memory structure employs separate buses for every memory module
and CPU. Every processor in a multiport memory is connected to each memory
unit.
09/08/2024 33
The processor bus consists of address, data and control lines required to communicate
with the memory.
Each memory module has multiple ports and each port accommodates one of the
buses.
The module must have internal control logic to determine which port will have access
to memory at any given time.
Memory access conflicts are resolved by assigning fixed priorities to each memory port.
Disadvantage of this technique is that it requires expensive memory control logic and
large no. of connectors.
09/08/2024 34
Crossbar switch
This organization consists of a no. of crosspoints placed at intersections between
processor buses and memory module paths
09/08/2024 35
The crosspoint consists of a switch that determines the path from a processor to a
memory module
Each switchpoint has control logic to set up the transfer path between processor
and memory. It examines the address that is placed in the bus to determine
whether its particular module is being addressed.
It also resolves multiple requests for access to same memory module on the basis
of pre-determined priority.
09/08/2024 36