0% found this document useful (0 votes)
17 views

Chapter 5

Uploaded by

spamailbat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Chapter 5

Uploaded by

spamailbat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

Chapter 5

A Closer Look at
Instruction Set
Architectures
5.2 Instruction Formats

In designing an instruction set, consideration is


given to:
1. Instruction length.
Whether short, long, or variable.
2. Number of operands.
3. Number of addressable registers.
4. Addressing modes.
–Choose any or all: direct, indirect or indexed.
5. Memory organization.
Whether byte- or word addressable.

2
5.2 Instruction Formats

• Byte ordering, ( endianness), is another major


architectural consideration.
• If we have a two-byte integer, the integer may be
stored so that the least significant byte is followed
by the most significant byte or vice versa.

3
Big endian

– Big endian machines store the most significant byte


first (at the lower address).

4
Little Endian

In little endian machines, the least significant byte is followed by the most

significant byte.

5
Example

• Show the big endian and small endian


arrangements of the bytes in 1234567816

6
Internal Storage in the CPU

• Stack architecture, instructions and operands are


implicitly taken from the stack.
– A stack cannot be accessed randomly.
• Accumulator architecture
– one operand of a binary operation is implicitly in the
accumulator.
– One operand is in memory, creating lots of bus traffic.
• General purpose register (GPR) architecture,
registers can be used instead of memory.
– Faster than accumulator architecture.
– Efficient implementation for compilers.
– Results in longer instructions.

In choosing one over the other, the tradeoffs are simplicity and cost of hardware
7
design with execution speed and ease of use.
5.2 Instruction Formats

• Most systems today are GPR systems.


• There are three types:
– Memory-memory where two or three operands may
be in memory.
– Register-memory where at least one operand must be
in a register
– Load-store where no operands may be in memory.
• The number of operands and the number of
available registers has a direct affect on
instruction length.
8
Stack architecture
• Stack machines use one - and zero-operand instructions.
• LOAD and STORE instructions require a single memory
address operand.
• Other instructions use operands from the stack implicitly.
• PUSH and POP operations involve only the stack’s top
element
• Binary instructions (e.g., ADD, MULT) use the top two items
on the stack.

9
10
11
12
13
14
15
16
17
18
19
5.2 Instruction Formats

• We have seen how instruction length is affected by the number


of operands supported by the ISA.
• In any instruction set, not all instructions require the same
number of operands.
• Operations that require no operands, such as HALT,
necessarily waste some space when fixed-length instructions
are used.
• One way to recover some of this space is to use expanding
opcodes.

20
5.2 Instruction Formats

 A system has 16 registers and 4K of memory.


memory
 We need 4 bits to access one of the registers.
 We also need 12 bits for a memory address.
address
 If the system is to have 16-bit instructions,
instructions we
have two choices for our instructions:

21
5.2 Instruction Formats

• If we allow the length of the opcode to vary, we could


create a very rich instruction set:

Is there something missing from this instruction set?


22
23
24
5.3 Instruction types

Instructions fall into several broad categories


that you should be familiar with:
• Data movement.
• Arithmetic.
• Boolean.
• Bit manipulation.
• I/O.
• Control transfer.

25
5.4 Addressing

• Addressing modes specify where an operand is


located.
• They can specify a
– Constant
– Register
– memory location.
• The actual location of an operand is its
effective address
• Certain addressing modes allow us to determine
the address of an operand dynamically.
26
5.4 Addressing

• Immediate addressing
– is where the data is part of the instruction.
• Direct addressing
– is where the address of the data is given in the instruction.
• Register addressing
– is where the data is located in a register.
• Indirect addressing
– gives the address of the address of the data in the instruction.
• Register indirect addressing
– uses a register to store the address of the operand.

27
5.4 Addressing
• Indexed addressing ( E.A = R + address in operand)
– uses a register as an offset, which is added to the address in the
operand to determine the effective address of the data.
• Based addressing
– is similar except that a base register is used instead of an index
register
• The difference between these two is that an
– index register holds an offset relative to the address given in the
instruction
– a base register holds a base address where the address field
represents a displacement from this base.
• Stack addressing
28
– the operand is assumed to be on top of the stack.
5.4 Addressing

• For the instruction shown, what value is loaded into


the accumulator for each addressing mode?

29
5.4 Addressing

• These are the values loaded into the accumulator


for each addressing mode.

30
5.5 Instruction-Level Pipelining

• Some CPUs divide the fetch-decode-execute cycle into


smaller steps.
• These smaller steps can often be executed in parallel to
increase throughput.
• Such parallel execution is called instruction-level pipelining
• This term is sometimes abbreviated ILP in the literature.

31
5.5 Instruction-Level Pipelining

• Suppose a fetch-decode-execute cycle were broken


into the following smaller steps:
1. Fetch instruction. 4. Fetch operands.
2. Decode opcode. 5. Execute instruction.
3. Calculate effective 6. Store result.
address of operands.

• Suppose we have a six-stage pipeline. S1 fetches


the instruction, S2 decodes it, S3 determines the
address of the operands, S4 fetches them, S5
executes the instruction, and S6 stores the result.

32
5.5 Instruction-Level Pipelining

• For every clock cycle, one small step is carried out,


and the stages are overlapped.

S1. Fetch instruction. S4. Fetch operands.


S2. Decode opcode. S5. Execute.
S3. Calculate effective S6. Store result.
address of operands.
33
theoretical speedup offered
by a pipeline
K = # of stages/ instruction
tp = time / stage.
Each instruction represents a task in the pipeline.
n = # of tasks (instructions)
• 1st task (instruction) requires k  tp , to complete in a k-stage pipeline.
• The remaining (n - 1) tasks
• Total time to complete the remaining tasks = (n - 1)tp.
Thus, to complete n tasks using a k-stage pipeline requires
(k  tp) + (n - 1)tp = (k + n - 1)tp.

34
5.5 Instruction-Level Pipelining

• If we take the time required to complete n tasks


without a pipeline and divide it by the time it takes to
complete n tasks using a pipeline, we find:

• If we take the limit as n approaches infinity, (k + n - 1)


approaches n, which results in a theoretical speedup
of:

35
5.5 Instruction-Level Pipelining
• Our equations take a number of things for granted
–First
we have to assume that the architecture supports
fetching instructions and data in parallel.
–Second
we assume that the pipeline can be kept filled at all
times. This is not always the case. Pipeline hazards
arise that cause pipeline conflicts and stalls
– Resource conflicts.
– Data dependencies.
– Conditional branching 36
Example
Q1) A nonpipelined system takes 40ns
to execute instructions. The instructions

can be processed by a 4-segment

pipeline with a clock cycle of 10ns.

Determine the speedup of the pipeline


38
for 22 instructions.
Q1) A nonpipelined system takes 100ns
to execute instructions. The instructions

can be processed by a 5-segment

pipeline with a clock cycle of 20ns.

Determine the speedup of the pipeline


39

for 31 instructions
Complex Instruction Set Computers

• The primary goal of CISC architecture is to


complete a task in as few lines of assembly as
possible.
• This is achieved by building processor hardware
that is capable of understanding and executing a
series of operations.

40
Complex Instruction Set Computers
• For this particular task, a CISC processor would
come prepared with a specific instruction (we'll call
it "MULT").
• When executed, this instruction
– loads the two values into separate registers
– multiplies the operands in the execution unit
– then stores the product in the appropriate location or
register.
– Thus, the entire task of multiplying two numbers can be
completed with one instruction:
41
– MULT 300, 200
42
Reduced Instruction Set Computers
• RISC processors only use simple instructions that
can be executed within one clock cycle.
• Thus, the "MULT" command described above could
be divided into three separate commands:
– LOAD, which moves data from the memory bank to a
register,
– MULT, which finds the product of two operands located
within the registers,
– STORE, which moves data from a register to the
memory banks.
43
Complex Instruction Set Computers
• It operates directly on the computer's memory banks
and does not require the programmer to explicitly
call any loading or storing functions.
• advantages of this system is that
– the compiler has to do very little work to translate a high-
level language statement into assembly.
– Because the length of the code is relatively short, very
little RAM is required to store instructions. The emphasis
is put on building complex instructions directly into the
hardware.
44
Reduced Instruction Set Computers

• In order to perform the exact series of steps


described in the CISC approach, a programmer
would need to code four lines of assembly:
• LOAD R1, 200
LOAD R2, 300
MULT R2, R1
STORE 200, R2

45
Reduced Instruction Set Computers

• At first, this may seem like a much less efficient


way of completing the operation.
– Because there are more lines of code
– more RAM is needed to store the assembly level
instructions.
– The compiler must also perform more work to convert a
high-level language statement into code of this form.

46
Reduced Instruction Set Computers
• However, the RISC strategy brings some very
important advantages.
– Because each instruction requires only one clock cycle to
execute, the entire program will execute in approximately
the same amount of time as the multi-cycle "MULT"
command.
– Because all of the instructions execute in a uniform amount
of time (i.e. one clock), pipelining is possible.
– Reduced instructions require less transistors of hardware
space than the complex instructions, leaving more room for
general purpose registers.
47
48

You might also like