Computer Organization and Architecture
Let us examine the flow of program instructions and data between the memory and
the processor. At the start of execution, all program instructions and the required data are
stored in the main memory. As the execution proceeds, instructions are fetched one by one
over the bus into the processor, and a copy is placed in the cache later if the same instruction
or data item is needed a second time, it is read directly from the cache.
The processor and relatively small cache memory can be fabricated on a single IC
chip. The internal speed of performing the basic steps of instruction processing on chip is
very high and is considerably faster than the speed at which the instruction and data can be
fetched from the main memory. A program will be executed faster if the movement of
instructions and data between the main memory and the processor is minimized, which is
achieved by using the cache.
For example:- Suppose a number of instructions are executed repeatedly over a short period
of time as happens in a program loop. If these instructions are available in the cache, they can
be fetched quickly during the period of repeated use. The same applies to the data that are
used repeatedly.
Processor clock: -
Processor circuits are controlled by a timing signal called clock. The clock designer
the regular time intervals called clock cycles. To execute a machine instruction the processor
divides the action to be performed into a sequence of basic steps that each step can be
completed in one clock cycle. The length P of one clock cycle is an important parameter that
affects the processor performance.
Processor used in today’s personal computer and work station have a clock rates that
range from a few hundred million to over a billion cycles per second.
Basic performance equation
We now focus our attention on the processor time component of the total elapsed
time. Let ‘T’ be the processor time required to execute a program that has been prepared in
some high-level language. The compiler generates a machine language object program that
Page 31
Computer Organization and Architecture
corresponds to the source program. Assume that complete execution of the program requires
the execution of N machine cycle language instructions. The number N is the actual number
of instruction execution and is not necessarily equal to the number of machine cycle
instructions in the object program. Some instruction may be executed more than once, which
in the case for instructions inside a program loop others may not be executed all, depending
on the input data used.
Suppose that the average number of basic steps needed to execute one machine cycle
instruction is S, where each basic step is completed in one clock cycle. If clock rate is ‘R’
cycles per second, the program execution time is given by
T=N*S/R
this is often referred to as the basic performance equation.
We must emphasize that N, S & R are not independent parameters changing one may
affect another. Introducing a new feature in the design of a processor will lead to improved
performance only if the overall result is to reduce the value of T.
Pipelining and super scalar operation: -
We assume that instructions are executed one after the other. Hence the value of S is
the total number of basic steps, or clock cycles, required to execute one instruction. A
substantial improvement in performance can be achieved by overlapping the execution of
successive instructions using a technique called pipelining.
Consider Add R1 R2 R3
This adds the contents of R1 & R2 and places the sum into R3.
The contents of R1 & R2 are first transferred to the inputs of ALU. After the addition
operation is performed, the sum is transferred to R3. The processor can read the next
instruction from the memory, while the addition operation is being performed. Then of that
instruction also uses, the ALU, its operand can be transferred to the ALU inputs at the same
time that the add instructions is being transferred to R3.
In the ideal case if all instructions are overlapped to the maximum degree possible
the execution proceeds at the rate of one instruction completed in each clock cycle.
Page 32
Computer Organization and Architecture
Individual instructions still require several clock cycles to complete. But for the purpose of
computing T, effective value of S is 1.
A higher degree of concurrency can be achieved if multiple instructions pipelines are
implemented in the processor. This means that multiple functional units are used creating
parallel paths through which different instructions can be executed in parallel with such an
arrangement, it becomes possible to start the execution of several instructions in every clock
cycle. This mode of operation is called superscalar execution. If it can be sustained for a long
time during program execution the effective value of S can be reduced to less than one. But
the parallel execution must preserve logical correctness of programs that is the results
produced must be same as those produced by the serial execution of program instructions.
Now days many processors are designed in this manner.
Clock rate
These are two possibilities for increasing the clock rate ‘R’.
1. Improving the IC technology makes logical circuit faster, which reduces the time of
execution of basic steps. This allows the clock period P, to be reduced and the clock
rate R to be increased.
2. Reducing the amount of processing done in one basic step also makes it possible to
reduce the clock period P. however if the actions that have to be performed by an
instructions remain the same, the number of basic steps needed may increase.
Increase in the value ‘R’ that are entirely caused by improvements in IC technology
affects all aspects of the processor’s operation equally with the exception of the time it takes
to access the main memory. In the presence of cache the percentage of accesses to the main
memory is small. Hence much of the performance gain excepted from the use of faster
technology can be realized.
Instruction set CISC & RISC:-
Simple instructions require a small number of basic steps to execute. Complex
instructions involve a large number of steps. For a processor that has only simple instruction
a large number of instructions may be needed to perform a given programming task. This
could lead to a large value of ‘N’ and a small value of ‘S’ on the other hand if individual
instructions perform more complex operations, a fewer instructions will be needed, leading
Page 33
Computer Organization and Architecture
to a lower value of N and a larger value of S. It is not obvious if one choice is better than the
other.
But complex instructions combined with pipelining (effective value of S ¿ 1) would
achieve one best performance. However, it is much easier to implement efficient pipelining in
processors with simple instruction sets.
RISC and CISC are computing systems developed for computers. Instruction set or
instruction set architecture is the structure of the computer that provides commands to the
computer to guide the computer for processing data manipulation. Instruction set consists of
instructions, addressing modes, native data types, registers, interrupt, exception handling and
memory architecture. Instruction set can be emulated in software by using an interpreter or
built into hardware of the processor. Instruction Set Architecture can be considered as a
boundary between the software and hardware. Classification of microcontrollers and
microprocessors can be done based on the RISC and CISC instruction set architecture.
Comparison between RISC and CISC:
RISC CISC
It stands for ‘Reduced It stands for ‘Complex
Acronym
Instruction Set Computer’. Instruction Set Computer’.
The RISC processors have a The CISC processors have a
Definition smaller set of instructions with larger set of instructions with
few addressing nodes. many addressing nodes.
It has no memory unit and uses It has a memory unit to
Memory unit a separate hardware to implement complex
implement instructions. instructions.
It has a hard-wired unit of It has a micro-programming
Program
programming. unit.
Design It is a complex complier design. It is an easy complier design.
The calculations are faster and The calculations are slow and
Calculations
precise. precise.
Decoding of instructions is Decoding of instructions is
Decoding
simple. complex.
Time Execution time is very less. Execution time is very high.
External It does not require external It requires external memory
Page 34
Computer Organization and Architecture
memory memory for calculations. for calculations.
Pipelining does function Pipelining does not function
Pipelining
correctly. correctly.
Stalling is mostly reduced in
Stalling The processors often stall.
processors.
Code expansion can be a Code expansion is not a
Code expansion
problem. problem.
Disc space The space is saved. The space is wasted.
Used in high end applications
Used in low end applications
such as video processing,
Applications such as security systems, home
telecommunications and image
automations, etc.
processing.
Page 35