3.3.5 Reduced Instruction Set Computing Processors (RISC)
3.3.5 Reduced Instruction Set Computing Processors (RISC)
3 HARDWARE
Years of development have been undertaken into improving the architecture of the central
processing unit, with the main aim of improving performance. Two competing architectures were
developed for this purpose, and different processors conformed to each one. Both had their
strengths and weaknesses, and as such also had supporters and detractors.
Performance was improved here by allowing the simplification of program compilers, as the range
of more advanced instructions available led to less refinements having to be made at the
compilation process. However, the complexity of the processor hardware and architecture that
resulted can cause such chips to be difficult to understand and program for, and also means they
can be expensive to produce.
Changing the architecture to this extent means that less transistors are used to produce the
processors. This means that RISC chips are much cheaper to produce than their CISC counterparts.
Also the reduced instruction set means that the processor can execute the instructions more quickly,
potentially allowing for greater speeds. However, only allowing such simple instructions means a
greater burden is placed upon the software itself. Less instructions in the instruction set means a
greater emphasis on the efficient writing of software with the instructions that are available.
Supporters of the CISC architecture will point out that their processors are of good enough
performance and cost to make such efforts not worth the trouble.
CISC RISC
Large (100 to 300) Instruction Set Small (100 or less)
Complex (8 to 20) Addressing Modes Simple (4 or less)
Specialized Instruction Format Simple
Variable Code Lengths Fixed
Variable Execution Cycles Standard for most
Higher Cost / CPU Complexity Lower
Compilation Simplifies Processor design
Processor design Complicates Software
Summary of the main differences between the two competing architectures
Looking at the most modern processors, it becomes evident that the whole rivalry between CISC
and RISC is now not of great importance. This is because the two architectures are converging closer
to each other, with CPUs from each side incorporating ideas from the other. CISC processors now
use many of the same techniques as RISC ones, while the reduced instruction sets of RISC processors
contain similar numbers of instructions to those found in certain CISC chips. However, it is still
Page 1
3.3 HARDWARE
important that you understand the ideas behind these two differing architectures, and why each
design path was chosen.
BRIEF NOTES
Instruction set specifies processor functionality including the operations supported by the processor,
storage mechanisms of the processor, and the way of compiling the programs to the processor.
The microcontroller architecture that utilizes small and highly optimized set of instructions is termed
as the Reduced Instruction Set Computer or simply called as RISC. It is also called as LOAD/STORE
architecture.
In the late 1970s and early 1980s, RISC projects were primarily developed from Stanford, UC-Berkley
and IBM. The John Coke of IBM research team developed RISC by reducing the number of
instructions required for processing computations faster than the CISC. The RISC architecture is faster
and the chips required for the manufacture of RISC architecture is also less expensive compared to
the CISC architecture.
Page 2
3.3 HARDWARE
With the increase in length of the instructions, the complexity increases for the RISC processors to
execute due to its character cycle per instruction.
The performance of the RISC processors depends mostly on the compiler or programmer as the
knowledge of the compiler plays a major role while converting the CISC code to a RISC code;
hence, the quality of the generated code depends on the compiler.
While rescheduling the CISC code to a RISC code, termed as a code expansion, will increase
the size. And, the quality of this code expansion will again depend on the compiler, and also on
the machine’s instruction set.
The first level cache of the RISC processors is also a disadvantage of the RISC, in which these
processors have large memory caches on the chip itself. For feeding the instructions, they require
very fast memory systems.
Drawbacks of CISC
The amount of clock time taken by different instructions will be different – due to this – the
performance of the machine slows down.
The instruction set complexity and the chip hardware increases as every new version of the
processor consists of a subset of earlier generations.
Only 20% of the existing instructions are used in a typical programming event, even though there
are many specialized instructions in existence which are not even used frequently.
The conditional codes are set by the CISC instructions as a side effect of each instruction which
takes time for this setting – and, as the subsequent instruction changes the condition code bits –
so, the compiler has to examine the condition code bits before this happens.
Page 3
3.3 HARDWARE
The wasting cycles can be prevented by the programmer by removing the unnecessary code
in the RISC, but, while using the CISC code leads to wasting cycles because of the inefficiency
of the CISC.
In RISC, each instruction is intended to perform a small task such that, to perform a complex task,
multiple small instruction are used together, whereas only few instructions are required to do the
same task using CISC – as it is capable of performing complex task as the instructions are similar
to a high-language code.
CISC is typically used for computers while RISC is used for smart phones, tablets and other
electronic devices.
The following figure shows more differences between RISC and CISC
DETAILED NOTES
Various suggestions have been made regarding a precise definition of RISC, but the general
concept is that of a system that uses a small, highly optimized set of instructions, rather than a more
versatile set of instructions often found in other types of architectures. Another common trait is that
RISC systems use the load/store architecture, where memory is normally accessed only through
specific instructions, rather than accessed as part of other instructions like add.
Hardware utilization
For any given level of general performance, a RISC chip will typically have far fewer transistors
dedicated to the core logic which originally allowed designers to increase the size of the register set
and increase internal parallelism.
Page 4
3.3 HARDWARE
Identical general purpose registers, allowing any register to be used in any context, simplifying
compiler design (although normally there are separate floating point registers);
Simple addressing modes, with complex addressing performed via sequences of arithmetic,
load-store operations, or both;
Few data types in hardware, some CISCs have byte string instructions, or support complex
numbers; this is so far unlikely to be found on a RISC.
Processor throughput of one instruction per cycle on average
The ARM architecture dominates the market for low power and low cost embedded systems
(typically 200–1800 MHz in 2014). It is used in a number of systems such as most Android-based
systems, the Apple iPhone and iPad, Microsoft Windows Phone (former Windows Mobile), RIM
devices, Nintendo Game Boy Advance and Nintendo DS, etc.
The MIPS line, (at one point used in many SGI computers) and now in the PlayStation, PlayStation
2, Nintendo 64, PlayStation Portable game consoles, and residential gateways like Linksys
WRT54G series.
Hitachi's SuperH, originally in wide use in the Sega Super 32X, Saturn and Dreamcast, now
developed and sold by Renesas as the SH4
Atmel AVR used in a variety of products ranging from Xbox handheld controllers to BMW cars.
RISC-V, the open source fifth Berkeley RISC ISA, with 32 bit address space a small core integer
instruction set, an experimental "Compressed" ISA for code density and designed for standard
and special purpose extensions
Page 5
3.3 HARDWARE
A modern RISC processor can therefore be much more complex than, say, a modern microcontroller
using a CISC-labeled instruction set, especially in terms of implementation (electronic circuit
complexity), but also in terms of the number of instructions or the complexity of their encoding
patterns. The only differentiating characteristic (nearly) "guaranteed" is the fact that most RISC
designs use uniform instruction length for (almost) all instructions and employ strictly separate
load/store-instructions.
RISC vs CISC
The simplest way to examine the advantages and disadvantages of RISC architecture is by
contrasting it with its predecessor: CISC (Complex Instruction Set Computers) architecture.
MULT is what is known as a "complex instruction." It operates directly on the computer's memory
banks and does not require the programmer to explicitly call any loading or storing functions. It
closely resembles a command in a higher level language. For instance, if we let "a" represent the
value of 2:3 and "b" represent the value of 5:2, then this command is identical to the C statement "a
= a * b."
One of the primary advantages of this system is that the compiler has to do very little work to translate
a high-level language statement into assembly. Because the length of the code is relatively short,
very little RAM is required to store instructions. The emphasis is put on building complex instructions
directly into the hardware.
LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3, A
Page 6
3.3 HARDWARE
At first, this may seem like a much less efficient way of completing the operation. Because there are
more lines of code, more RAM is needed to store the assembly level instructions. The compiler must
also perform more work to convert a high-level language statement into code of this form.
CISC RISC
Emphasis on hardware Emphasis on software
Includes multi-clock Single-clock,
complex instructions reduced instruction only
Memory-to-memory: Register to register:
"LOAD" and "STORE" "LOAD" and "STORE"
incorporated in instructions are independent instructions
Small code sizes, Low cycles per second,
high cycles per second large code sizes
Transistors used for storing Spends more transistors
complex instructions on memory registers
The CISC approach attempts to minimize the number of instructions per program, sacrificing the
number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the
cost of the number of instructions per program.
RISC Roadblocks
Despite the advantages of RISC based processing, RISC chips took over a decade to gain a foothold
in the commercial world. This was largely due to a lack of software support.
Although Apple's Power Macintosh line featured RISC-based chips and Windows NT was RISC
compatible, Windows 3.1 and Windows 95 were designed with CISC processors in mind. Many
companies were unwilling to take a chance with the emerging RISC technology. Without
commercial interest, processor developers were unable to manufacture RISC chips in large enough
volumes to make their price competitive.
Another major setback was the presence of Intel. Although their CISC chips were becoming
increasingly unwieldy and difficult to develop, Intel had the resources to plow through development
and produce powerful processors. Although RISC chips might surpass Intel's efforts in specific areas,
the differences were not great enough to persuade buyers to change technologies.
SUMMARY
CISC stands for Complex Insruction Set Computers and RISC stands for Reduced Instruction Set
Computer and they represent two lines of thought when designing a new computer chip.
Question: Is it better to make more complicated instructions available that take many cycles to
complete or is it better to restrict you to a smaller, simpler instruction set that each only take a single
cycle to complete?
Page 7
3.3 HARDWARE
Answer: It depends.
And this is the question behind whether CISC or RISC is the better approach.
Up until recently, the major chip makers preferred the CISC approach. Each generation of their chips
offered larger and richer instruction sets compared to the one before. But now the RISC approach
seems to be favoured one.
In a CISC chip a single instruction such as MULT a,b is available. The chip-maker adds more and more
complex hardware circuits within the CPU to carry out these instructions. So the trade-off is more
complex hardware to support simpler software coding.
The compiler, when seeing a multiply command written in high level language source code can
generate a single machine code instruction to carry out the task - job done.
In a RISC chip it is the other way around - keep the hardware simple and let the software be more
complicated. There may be no single multiply instruction available, so the compiler now has to
generate more lines of code such as
But each of those instructions can be carried out in a single cycle. You can also use the pipeline
method to speed it up even more (since 'a' and 'b' do not depend on each other). So overall the
RISC approach may be faster.
COMPARISION
CISC RISC
Has more complex hardware Has simpler hardware
More compact software code More complicated software code
Takes more cycles per instruction Takes one cycle per instruction
Can use less RAM as no need to store Can use more RAM to handle
intermediate results intermediate results
Pipeline
In computing, a pipeline is a set of data processing elements connected in series, where the output
of one element is the input of the next one. The elements of a pipeline are often executed in parallel
or in time-sliced fashion; in that case, some amount of buffer storage is often inserted between
elements.
Page 8
3.3 HARDWARE
Graphics pipelines, found in most graphics processing units (GPUs), which consist of multiple
arithmetic units, or complete CPUs, that implement the various stages of common rendering
operations (perspective projection, window clipping, color and light calculation, rendering,
etc.).
Software pipelines, where commands can be written where the output of one operation is
automatically fed to the next, following operation. The Unix system call pipe is a classic example
of this concept, although other operating systems do support pipes as well.
Instruction Pipelining
Instruction pipelining is a technique that implements a form of parallelism called instruction-level
parallelism within a single processor. It therefore allows faster CPU throughput (the number of
instructions that can be executed in a unit of time) than would otherwise be possible at a given clock
rate. The basic instruction cycle is broken up into a series called a pipeline. Rather than processing
each instruction sequentially (finishing one instruction before starting the next), each instruction is
split up into a sequence of steps so different steps can be executed in parallel and instructions can
be processed concurrently (starting one instruction before finishing the previous one).
Pipelining increases instruction throughput by performing multiple operations at the same time, but
does not reduce instruction latency, which is the time to complete a single instruction from start to
finish, as it still must go through all steps. Indeed, it may increase latency due to additional overhead
from breaking the computation into separate steps and worse, the pipeline may stall (or even need
to be flushed), further increasing the latency. Thus, pipelining increases throughput at the cost of
latency, and is frequently used in CPUs but avoided in real-time systems, in which latency is a hard
constraint.
Each instruction is split into a sequence of dependent steps. The first step is always to fetch the
instruction from memory; the final step is usually writing the results of the instruction to processor
registers or to memory. Pipelining seeks to let the processor work on as many instructions as there are
dependent steps, just as an assembly line builds many vehicles at once, rather than waiting until one
vehicle has passed through the line before admitting the next one. Just as the goal of the assembly
Page 9
3.3 HARDWARE
line is to keep each assembler productive at all times, pipelining seeks to keep every portion of the
processor busy with some instruction. Pipelining lets the computer's cycle time be the time of the
slowest step, and ideally lets one instruction complete in every cycle.
The term pipeline is an analogy to the fact that there is fluid in each link of a pipeline, as each part
of the processor is occupied with work.
Pipelining
A form of computer organization in which successive steps of an instruction sequence are executed
in turn by a sequence of modules able to operate concurrently, so that another instruction can be
begun before the previous one is finished.
Pipelining in RISC:
Break instruction cycle into n phases (one stage per phase) e.g. Fetch, Decode, ReadOPs,
Execute1, Execute2, WriteBack
Fetch a new instruction each phase
Maximum speed gain is n Hazards reduce the ability to achieve a gain of n
Types of Hazards
o Resource
Hazard occurs when instruction needs a resource being used by another
instruction
o Data
RAW (hazard if read can occur before write has finished)
WAR (hazard if write can occur before read is finished)
WAW (hazard if writes occur in the unintended order)
o Control
Hazard occurs when a wrong fetch decision at a branch results in an extra
instruction fetch and a pipeline flush
Stalling can always “fix” a hazard
Hardware contribution
o Have more registers
Thus more variables will be in registers
Register uses
Store local scalar variables in registers
o Reduces memory accesses
Every procedure (function) call changes locality (typically lots of procedure calls are
encountered)
o Parameters must be passed
o Partial context switch
o Results must be returned
o Variables from calling program must be restored
o Partial Context switch
Page 10
3.3 HARDWARE
When a hardware device sends a signal to the ARM processor indicating that it needs attention, this
is called an interrupt. The process of sending an interrupt is known as an interrupt request. Devices
handled by interrupts include the keyboard, printer, mouse, serial port, disc drives and expansion
cards, as well as built-in timers.
When an interrupt is received, RISC OS temporarily halts the active task, and enters an interrupt
routine. The routine deals with the interrupting device very quickly so it can continue with the previous
task as quickly as possible. Often, interrupts are handled so quickly that users never realise their task
was temporarily halted.
Interrupts are an efficient method of dealing with hardware devices, as the devices inform the
system that it needs attention, rather than the system regularly checking all the devices. Because
external hardware, such as expansion cards, can generate new interrupts, it is possible to install new
routines to deal with them.
Each device that can generate interrupts has a device number. There are corresponding device
vectors, similar to Software Vectors and Hardware Vectors. Installed on each vector is a default
device driver that receives only interrupts from that device.
Interrupt types
There are two different types of interrupt requests: IRQs (interrupt requests) and FIQs (fast interrupt
requests). As the name suggests, FIQs are generated by devices that require their request be
serviced more quickly than the normal IRQ request. The ARM processor has a separate mode, its
own hardware vector and range of device numbers for dealing with FIQs.
Error Handling
Routines that handle interrupts must only call the error-returning SWI calls (that is, SWI calls that
have their X-bit set). If an error is returned to the routine, appropriate action must be taken within
the routine. It may be useful to store an error indicator within the routine, so that the next call to an
appropriate SWI (one in the module that provides the routine) will generate an error.
Page 11