0% found this document useful (0 votes)
91 views130 pages

COA Final Merged

Uploaded by

speedkilla2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views130 pages

COA Final Merged

Uploaded by

speedkilla2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

The Digital Logic Level

Chapter 3
Structured Computer Organization
(6th_Edition)
TANENBAUM
Memory: Latches
Setting S (i.e., making
S, for Setting the latch, and
it 1) switches the
R, for Resetting (i.e.,
state from 0 to 1
clearing) it

SR latch R = S = 0, the latch has two stable S = R = 1 is


states, which we will refer to as 0 considered as
and 1, depending on Q invalid state

Figure 3-21. (a) NOR latch in state 0. (b) NOR latch in state 1.
(c) Truth table for NOR.
Memory: Latches
(Two NAND gates)
Memory: Latches

A. Setting S (i.e., making it 1) switches the state from 0 to 1


B. When the SET input returns to LOW, however, the
output remains HIGH. The output of the active-high latch
stays HIGH until the RESET input goes HIGH
C. Setting R to 1 when the latch is in state 0 has no effect
because the output of the lower NOR gate is 0 for inputs
of 1 0 and inputs of 1 1
D. when S is set to 1 momentarily, the latch ends up in state
Q = 1, regardless of what state it was previously in
E. Setting R to 1 momentarily forces the latch to state Q = 0
F. The circuit ‘‘remembers’’ whether S or R was last on
Clocked SR Latches
With the clock 0, both AND gates output 0,
independent of S and R, and the latch does not
change state

When the clock input is 1, the circuit becomes


sensitive to the state of S and R

Figure 3-22. A clocked SR latch.


Clocked D Latches
Clocked D latch, is a true 1-bit memory
(used for permanent storage)

Figure 3-23. A clocked D latch (requires 11 transistors).


A good way to resolve the SR latch’s instability (caused when S = R =
1).
When the clock is 1, the current value of D is sampled and stored in
the latch
When the clock is 1, the current value of D is sampled and stored in the latch
Flip-Flops (1)
A flip-flop is edge triggered,
whereas a latch is level triggered
- The state transition occurs not when the
clock is 1 but during the clock transition
from 0 to 1 (rising edge) or from 1 to 0
(falling edge) instead

clock pulse

- The inverter has a small, but


nonzero, propagation delay through it,
and that delay is what makes the
circuit work

Figure 3-24. (a) A pulse generator.


(b) Timing at four points in the circuit.
Flip-Flops (2)

This time shifting just means that the D latch will be activated at a fixed delay
after the rising edge of the clock, but it has no effect on the pulse width

Figure 3-25. A D flip-flop.


Flip-Flops (3)

A latch whose State from D Flip-flop that Changes


state is loaded is loaded changes state state on the
when the when the on the rising falling edge
clock, CK, is 1 clock drops to edge of the
0 clock pulse
Figure 3-26. D latches and flip-flops.
Memory Organization (1)
Flip-flops are loaded on the rising transition of the clock

I3

Used as an
when the amplifier
clear signal
CLR goes to
0, all the flip-
flops are
forced to
their 0 state

The register accepts an 8-bit input value (I0 to I7) when the clock CK
transitions
Figure 3-27. An 8-bit storage register constructed from
eight single-bit flip-flops.
Memory Organization (2a)

Write OP: CK is
enabled loading
Four word-select
the input data
AND gates form
into flip-flops
a decoder

Figure 3-28. Logic diagram for a 4 x 3 memory (easily extensible).


Each row is one of the four 3-bit words. A read or write operation
always reads or writes a complete word.
Memory Organization (2b)

Read OP: CK is
disabled and
none of the flip- Read OP: AND
flops is modified gates tied to the
Q bits of the
selected word
are enabled

Figure 3-28. Logic diagram for a 4 x 3 memory. Each row is one of


the four 3-bit words. A read or write operation always reads or
writes a complete word.
Memory Organization (3)
• Three inputs are data: I0, I1, and I2;
• Two are for the address: A0 and A1; and
• Three are for control: CS for Chip Select, RD for distinguishing between
read (logical 1) and write, and OE for Output Enable
• The three outputs are for data: O0, O1, and O2.
• The value of the address lines determine which of the 4 memory bits is
allowed to input or output a value
• Memory bits share an output signal (each 4 memory bits share one output
signal)
• The 8-bit register requires 20 signals, including power and ground, while the
12-bit memory requires only 13 signals
• For a read operation, the data input lines are not used, but the word selected
is placed on the data output lines
• For a write operation, the bits present on the data input lines are loaded into
the selected memory word
Exercise

CS OE RD A0 A1

(a) 1 1 0 1 0

(b) 1 1 1 1 0

Assume that Q=1 for all flip flops in the following 4x3 memory. In the
diagrams below, mark all gates that will be outputting a 1 given the
following sets of inputs:
Memory Organization (4)

an electronic switch that


can make or break a
connection

Figure 3-29. (a) A noninverting buffer. (b) Effect of


(a) when control is high. (c) Effect of (a) when control is low.
(d) An inverting buffer.
Nonvolatile Memory Chips(2)

Figure 3-32. A comparison of various memory types.


Revision: Chapter 3 - Exercise 12

What does this circuit do?


Revision: Chapter 3 - Exercise 12

Option 1

Option 2

A’B + AB’
Revision: Chapter 3 - Exercise 14
• An n-bit adder can be constructed by cascading n full
adders in series, with the carry into stage i, Ci , coming
from the output of stage i − 1. The carry into stage 0,
C0, is 0.

• If each stage takes T nsec to produce its sum and carry,


the carry into stage i will not be valid until iT nsec after
the start of the addition. For large n the time required for
the carry to ripple through to the high-order stage may
be unacceptably long. Design an adder that works
faster.

• Hint: Each Ci can be expressed in terms of the operand


bits Ai − 1 and Bi − 1 as well as the carry Ci − 1. Using this
relation it is possible to express Ci as a function of the
inputs to stages 0 to i − 1, so all the carries can be
generated simultaneously.
Revision: Chapter 3 - Exercise 14
The carry into stage i can be written as Ci = Pi -1 + Si - 1Ci - 1, where Pi - 1 is
the product term Ai - 1Bi - 1 and Si - 1 is the sum term Ai - 1 xor Bi - 1. This result
follows directly from the fact that a carry is generated from a stage if both
operands
are 1, or if one operand and the carry in are both 1. For example,
C0 = 0
C1 = P0 + S0C0 = P0 Carry out is 1 if either A and B
C2 = P1 + S1C1 = P1 + P0S1 are both 1, or exactly one of
C3 = P2 + S2(P1 + P0S1) = P2 + P1S2 + P0S1S2 them is 1 and the Carry in bit
C4 = P3 + S3C3 = P3 + P2S3 + P1S2S3 + P0S1S2S3 is also 1
As soon as the inputs, A and B, are available, all the P and S terms can be
generated
simultaneously in one gate delay time. Then the various AND terms
such as P0S1S2 can be generated in a second gate delay. Finally, all the
carries can be produced in a third gate delay. Thus, all the carries are
available after three gate delays, no matter how many stages the adder
has. The price paid for this speedup is a considerable number of additional
gates, of course.
Revision: Chapter 3 - Exercise 20
The circuit of Fig. 3-25 is a flip-flop that is triggered on the rising edge of the
clock. Modify this circuit to produce a flip-flop that is triggered on the falling
edge of the clock.
Revision: Chapter 3 - Exercise 20
The circuit of Fig. 3-25 is a flip-flop that is triggered on the rising edge of the
clock. Modify this circuit to produce a flip-flop that is triggered on the falling
edge of the clock.

Use the same circuit, but replace the AND gate in the pulse generator by a
NOR gate. The only time both inputs will be low is just after the falling edge.
Revision: Chapter 3 - Exercise 21
The 4 × 3 memory of Fig. 3-28 uses 22 AND gates and three OR gates. If
the circuit were to be expanded to 256 × 8, how many of each would be
needed?
Revision: Chapter 3 - Exercise 21
The 4 × 3 memory of Fig. 3-28 uses 22 AND gates and three OR gates. If
the circuit were to be expanded to 256 × 8, how many of each would be
needed?

The design uses two AND gates for chip enable logic plus two AND gates
per word select line plus one AND gate per data bit. For a 256 × 8 memory
this comes to 2 + 512 + 2048 = 2562 AND gates. The circuit also uses one
OR gate for each bit in the word; hence eight of them would be needed.
CPU Chip Control Pins

• Bus control
• Interrupts
• Bus arbitration
• Coprocessor signaling
• Status
• Miscellaneous
CPU Chips

Figure 3-34. The logical pinout of a generic CPU. The arrows


indicate input signals and output signals. The short diagonal lines
indicate that multiple pins are used. For a specific CPU, a number
will be given to tell how many.
Computer Buses (1)

Figure 3-35. A computer system with multiple buses.


Computer Buses (2)

Figure 3-36. Examples of bus masters and slaves.


Bus Width

Figure 3-37. Growth of an address bus over time.


Synchronous Buses (1)

Reading from memory takes 15 nsec from the time the address is stable
Figure 3-38. (a) Read timing on a synchronous bus
(100MHz --> 10 nsec bus cycle).
Synchronous Buses (2)

MREQ’ indicates that memory (as opposed to an I/O device) is being accessed
RD’ is asserted for reads and negated for writes
Figure 3-38. (b) Specification of some critical times.
Asynchronous Buses

Figure 3-39. Operation of an asynchronous bus.


Four Events of Full-Handshake

Full handshakes are timing independent


Bus Arbitration (1)
daisy chaining

What happens if two or more devices all want to become bus master at the same time?
Figure 3-40. (a) A centralized one-level bus arbiter using daisy
chaining. (b) The same arbiter, but with two levels.
Bus Arbitration (2)

1) Wired-
OR line

2) Asserted by
the current bus
master

To acquire the bus, a device first checks to see if the bus is idle and the
arbitration signal it is receiving, IN, is asserted
Only one device will have IN asserted and OUT negated. This device
becomes bus master, asserts BUSY and OUT, and begins its transfer.

Figure 3-41. Decentralized bus arbitration


Bus Operations (1)

In this example, a block read of 4 words takes 6 cycles instead of 12

Figure 3-42. A block transfer.


The memory outputs one word during each cycle until the count has been exhausted
Bus Operations (2)

IRx is asserted causes INTerrupt to be


asserted

INTerrupt
Acknowledge

Up to eight I/O controllers can be


directly connected to the eight IRx
(Interrupt Request) inputs

Figure 3-43. Use of the 8259A interrupt controller


The Intel Core i7
1.16 billion
transistors and
Figure 3-44. The running at speeds up
to 3.5 GHz with line
Core i7 physical widths of 32
pinout. nanometers (2-6
processors)

The Core i7 processor can carry out up to four instructions at once, making it a 4-
wide superscalar machine.
Each processor has a 32-KB level 1 (L1) data cache and a 32-KB level 1
instruction cache. Each core also has its own 256-KB level 2 (L2) cache. All
cores share a single level 3 (L3) unified cache, the size of which varies from 4 to
15 MB
The Core i7’s Logical Pinout
Allow 1333 million
transactions per
second

peripherals to the
Bus Signals

Connects

CPU

Figure 3-45. Logical pinout of the Core i7 (1155 pins).


Pipelining on Core i7’s DDR3 Memory Bus
(keep the CPU from starving for lack of data)

The Core i7 DDR3 memory bus can be operated in a


pipelined manner. Three steps of memory requests

• Activate phase – opens DRAM memory row, ready for


access

• Read or Write phase – multiple accesses can be made to


individual words

• Precharge phase – closes DRAM memory row, prepare for


next activate
Pipelining

A typical DDR3 DRAM chip will have as many as 8 banks of DRAM


The DDR3 interface specification allows only up to four concurrent accesses on
a single DDR3 channel
Figure 3-46. Pipelining memory requests on the
Core i7’s DDR3 interface.
The PCI Bus (Peripheral Component
Interconnect bus)

Figure 3-51. Architecture of an early Pentium system. The new


PCI bus runs at up to 66 MHz and can handle 64-bit transfers,
for a total bandwidth of 528 MB/sec
The PCI Bus (2a)

Figure 3-52. The bus structure of a modern Core i7 system.


The PCI Bus (2b)

Figure 3-52. The bus structure of a modern Core i7 system.


PCI Bus Arbitration

Figure 3-53. The PCI bus uses a centralized bus arbiter.


The PCI Express Architecture

A general-purpose
switch for
connecting chips
using serial links

Figure 3-56. A typical PCI Express system.


PCI Express vs. Old PCI

I. A centralized switch vs. a multidrop bus


II. The use of narrow serial point-to-point connections vs. a wide
parallel bus
III. The PCI Express model introduced the concept of packet,
which consists of a header and a payload (taken from the
networking world)
IV. an error-detecting code is used on the packets, providing a
higher degree of reliability than on the PCI bus (called a CRC
(Cyclic Redundancy Check))
Exercises: Chapter 3 - Exercise 16
A 16-bit ALU is built up of 16 1-bit ALUs, each one having an add
time of 10 nsec. If there is an additional 1-nsec delay for
propagation from one ALU to the next, how long does it take for the
result of a 16-bit add to appear?
Exercises: Chapter 3 - Exercise 17
Sometimes it is useful for an 8-bit ALU such as Fig. 3-19 (below) to
generate the constant −1 as output. Give two different ways this can
be done. For each way, specify the values of the six control signals.
Exercises: Chapter 3 - Exercise 17
Exercises: Chapter 3 - Exercise 25
Referring to the timing diagram of Fig. 3-38, suppose that you
slowed the clock down to a period of 20 nsec instead of 10 nsec as
shown but the timing constraints remained unchanged. How much
time would the memory have to get the data onto the bus during T3
after MREQ’ was asserted, in the worst case?
Solution: Chapter 3 - Exercise 25
With a 20-nsec clock period, MREQ might be asserted as late as 13
nsec into T1.

The data required 2 nsec before the high-to-low transition in T3,


which occurs 10 nsec after the start of the cycle. From the midpoint
of T1 to the midpoint of T3 is 40 nsec. Since the memory cannot
start until 3 nsec after the transition in the first cycle and has to be
done 2 nsec before the transition in the third cycle, in the worst case
the memory has only 35 nsec in which to respond.
Exercises: Chapter 3 - Exercise 30
Multicore chips, with multiple CPUs on the same die, are
becoming popular. What advantages do they have over a
system consisting of multiple PCs connected by Ethernet?

In a multicore system, the cores can share


caches and primary memory easily.
Having a shared memory multiprocessor makes
programming many applications easier. There
is often also a high-bandwidth interconnection
between the cores for signaling, etc.
End

Chapter 3
The Microarchitecture Level

Chapter 4
Reference from TANENBAUM’s book
Structured Computer Organization (6th
Edition)
Contemporary Multilevel Machines (2)

...
The set of instructions carried out
interpretively by the micro-programmer
or hardware execution circuits

A collection of (8 to 32) registers


connected to an ‘ALU’ Arithmetic Logic
Unit to form a ‘data path’

Machines’ true hardware


and electronic circuits
(i.e., ‘gates’ that are built up of
a handful of transistors)

Figure 1-2. A six-level computer. The support method for each


level is indicated below it (along
with the name of the supporting program).
Contemporary Multilevel Machines (1)
Programs written in Java for instance
are first translated to an ISA-like
language called Java byte code,
which is then interpreted

This set of instructions comprises


most of the ISA instructions, plus
other advanced features such as
a different memory organization,
the ability to run programs concurrently
...

Figure 1-2. A six-level computer. The support method for each


level is indicated below it (along
with the name of the supporting program).
Microarchitecture Level

• Many modern ISAs, particularly RISC designs, have simple


instructions that can usually be executed in a single clock cycle. More
complex ISAs, such as the Core i7 instruction set, may require many
cycles to execute a single instruction.

• Every ISA is a special case. Consequently, we will discuss a detailed


example. For our example ISA, we have chosen a subset of the Java
Virtual Machine (IJVM), dealing only with integers.

• Our microarchitecture will contain a microprogram (in ROM), whose


job is to fetch, decode, and execute IJVM instructions

• We need a tiny microprogram that drives the individual gates in the


actual hardware efficiently.
Microarchitecture Level

Data

Stack
Macroinstr.

Microprogram

Set of microinstructions Working space Compiled


Microarchitecture Level (read only memory) program

A computer system with multiple buses

5
MEMORY

MAR

MBR
CPU Main Memory
0
System 1
2
PC MAR Bus
Instruction
Instruction
Instruction
IR MBR

I/O AR
Data
Execution
unit Data
I/O BR Data
Data

I/O Module n–2


n–1

PC = Program counter
Buffers IR = Instruction register
MAR = Memory address register
MBR = Memory buffer register
I/O AR = Input/output address register
I/O BR = Input/output buffer register

Figure 3.2 Computer Components: Top-Level View


Mic-1
MicroProgram Counter
(address of the next
microinstruction)

set of
microinstr.

Read/Write
from/to
Current microinst.
Memory
in MIR register

Addr[8]

Data Path Control


Section

8
Microinstructions (1)
Functional Signal Groups:
9 Signals to keep track of the next address (8 for data + 1 bit for jump).
9 Signals to control writing data from C bus into registers.
4 Signals to enable registers onto B bus for ALU input.
8 Signals to control ALU and shifter functions.
3 Signals to indicate memory read/write/fetch via MAR/MDR.
3 Signals to indicate JAM options for branching.
Microinstructions (3)

Groups of signals:

Addr – Contains address of potential next microinstruction.


JAM – Determines how the next microinstruction selected.
ALU – ALU and shifter functions.
C – Selects which registers written from C bus.
Mem – Memory functions.
B – Selects B bus source; encoded as shown.
Miscellaneous
Mic-1 has a so called microprogrammed architecture:
Each macroinstruction (also called ISA instruction, or IJVM instruction in the case of Mic-1) is
divided into one or more microinstructions.
Each microinstruction is executed in exactly one cycle.

...but why are all these jumps required to determine the next microinstruction ?

In case of conditional jumps (if..then..else) we normally need two jump addresses as parameter.

To uniform the microinstruction format all instructions must have the same length:
either we make all microinstructions contain two addresses (-> waste of space) or
(better solution) we specify only one address and compute the second one as
Addr + Constant Value (in Mic-1 we have: Constant Value = 0x100)

JMPC is used to jump to the address specified by MBR, which, as we will see, contains
the opcode of the macroinstruction. Note that the microinstructions for each
macroinstruction M are stored starting from the position determined by the
opcode of M.

Example. The opcode of the macroinstruction BIPUSH is 0x10. This means that the
corresponding microinstructions start at address 0x10 in the control store.
11
Microinstruction Control: The Mic-1 (1)

The sequencer must produce two kinds of information each


cycle:

• The state of every control signal in the system


• The address of the next microinstruction to be executed
•Each micro-instruction explicitly specifies its successor because
instructions in the microprogram are not necessarily executed by
order (as in main memory)
Microinstruction Format
Addr is the address of the next microinstruction
JAM controls the PC (progr. counter), N (negative) and Z (zero) jump
ALU controls the ALU and Shifter operations
C enables writing from C bus to the selected registers
Mem controls memory read/write/fetch operations
B controls the register which can write to the B bus

13
ALU, Registers, Buses and Control
Signals
 Control Signals
9 for reading and 9 for writing the registers
To and 8 for ALU/Shifter and 3 for read/write/fetch
from main
memory  Nine 32-bit registers
MAR: Memory Address Register
MDR: Memory Data Register
PC: Program Counter
SP: Stack Pointer

 One 8-bit register
MBR: Memory Buffer Register
 A bus : drives data from register H to the ALU
 B bus : drives data from one register to the ALU
 C bus : drives data from the ALU to registers
 ALU
with 6 control signals, two additional outputs:
N tests for negative numbers and
Z tests for zero;
and a shifter:
SLL8 to shift the content left by 8 bits (logical shift)
SRA1 shifts the content right by one bit (arith.
shift) 1
Microprogram Registers

Microprogram Counter (MPC)


This register stores the address (in the control store) of the next
microinstruction.
Microinstruction Register (MIR)
This register stores the current microinstruction whose bits drive the control
lines of the CPU. In the simulator, the contents of the MIR are displayed
simultaneously in two formats: binary and micro assembly language (mal).
Negative (N)
This one-bit register stores a copy of the sign bit (the high-order bit) of the value
generated by the ALU. If the JAMN bit in a microinstruction is set, the N
register is used to determine the address of the next microinstruction.
Zero (Z)
This one-bit register is set to 1 if the value generated by the ALU is 0.
Otherwise, this register is set to 0. If the JAMZ bit in a microinstruction is set,
the Z register is used to determine the address of the next microinstruction.
Data Path Registers

Memory Address Register (MAR)


Stores the address of the 32-bit word that is being written to or read from
memory.
Memory Data Register (MDR) - Register 0
The MDR is used to hold the 32-bit word that is being written to or read from
memory. When no memory access is being performed, this register can be
used as a general register.
Program Counter (PC) - Register 1
When interpreting machine language programs, this register serves as the
program counter and holds the address of the next byte in the instruction
stream. Otherwise, this register can be used as a general register.
Data Path Registers

Memory Buffer Register (MBR) - Register 2/3


This register stores a single byte read from the memory cell addressed by PC.
When read as register 2, the byte data is treated as a signed integer in the
range -128 to 127 (the sign bit of the byte data is extended into the leading
24 bits creating a 32-bit value). When read as register 3, the byte data is
treated as an unsigned value in the range 0 to 255 (the leading 24 bits are
set to zeroes). This register is a read-only register. When interpreting
machine language programs, this register holds the current byte of the
instruction stream (opcode).
Stack Pointer (SP) - Register 4
When interpreting machine language programs, this register serves as the
stack pointer. Otherwise, this register can be used as a general register.
Local Variables (LV) - Register 5
When interpreting machine language programs, this register points to the local
variables of the currently executing method. Otherwise, this register can be
used as a general register.
Data Path Registers

Constant Pool Pointer (CPP) - Register 6


When interpreting machine language programs, this register points to the
constant pool. Otherwise, this register can be used as a general register.
Top of Stack (TOS) - Register 7
When interpreting machine language programs, this register always contains a
copy of the value at the top of the stack. Otherwise, this register can be used
as a general register.
Old Program Counter (OPC) - Register 8
When interpreting machine language programs, this register is used to store the
old program counter and also as a scratchpad register. Otherwise, this
register can be used as a general register.
Hold (H)
This register holds data that is to be supplied to the left (A) input of the
arithmetic logic unit. This is the only register that can perform this function.
Data Path Registers

Shifter (SHFT)
There are two possible shift operations. The first shifts the value
from the ALU 8 bits to the left with zero fill (SLL shift Left
Logical). The second shifts the value in the ALU one bit to the
right with sign extension (Shift Right Arithmetic). Sign extension
means that the high order bit prior to the shift will be copied into
the vacated high order bit position following the shift right. The
value displayed in the shifter is the value after the specified shift
operation is performed. If there is no shift operation, the value
in the shifter will be the same as the value displayed in the
ALU.
Mic-1 Memory Operations
•Two ports into memory
• 32 bit port
• 8 bit port
•32-bit port: (Data Cache interface)
• MAR - Memory Address register specifies the memory
address to use for a memory operation (LOAD or STORE)
• MDR - Memory Data Register serves as destination or source
for LOAD and STORE operations
• note that you cannot put the MAR contents on the B bus
•8-bit port: (Instruction Cache interface)
• PC - Program Counter serves as the address pointer into
memory
• MBR - Memory Buffer Register serves as destination for
memory contents (i.e., opcodes and their other fields)
Data Path Synchronization

1
0 5 6 7

2
Each
5 microinstruction
is executed in one
cycle !
1 2 3 4
4

0. Falling edge: MIR is updated with the current MPC


1. ∆w: All control signals stabilize
3
2. ∆x: The value of one register is put on the B bus
3. ∆y: ALU and shifter operate
4. ∆z: The result propagate on the C bus
Clock low: 5. Rising edge: The result is written in the registers
data path is computed 6. Clock high: MPC is computed
Clock high:
MPC value is 7. Falling edge: a new cycle is started: goto 0 ! 21
computed
MAR & MDR, PC & MBR Registers

A reading from main memory takes two


cycles:
MDR has two memory operations: read and write One for putting the address in MAR
MBR has one memory operation: fetch One for getting the data in MDR
(assuming the main memory works fast
MBR has two control signals for the B bus:
enough)
one for signed (all higher bits are filled with MBR[7])
and one unsigned (all higher bits are filled with 0) N consecutive reads can be pipelined:
1st read is available at beginning of cycle 3
2nd read is available at beginning of cycle 4
Nth read is available at beginning cycle N+2

MAR addressing trick


 Memory is byte addressed (8 bit)
 Data is word addressed (4 byte = 32 bit)
=> MAR addresses are shifted 2 bits left ( = * 4)
MIR Register and MPC

0 5 6 7
G

0 7

0. Falling edge: MIR is updated with the current MPC


5. Rising edge: The result of the ALU is written in one or
more of the 9 registers MAR, …, H and the two 1-bit flip-flop
Addr[8] 6. Clock high: MPC is computed:
F = (JAMN and N) or (JAMZ and Z) or Addr[8]
F G = (MBR and JMPC) or Addr[0..7]
5
Note that if JMPC=1 then Addr=0x00 or Addr=0x100
7. Falling edge: a new cycle is started: goto 0 !
There are two synchronization points:
At the beginning of the cycle (step 0) via the register MIR
At the end of the cycle (step 5) via the 9 registers of the
ALU and the two flip-flop 23
Microinstruction Control: The Mic-1 (4)
o In all cases, MPC can take on only one of two possible values:
o The NEXT ADDRESS
o The NEXT ADDRESS with the high-order bit ORed with 1
o When either JAMN or JAMZ is 1, there are two potential
successors: NEXT ADDRESS and NEXT ADDRESS ORed with
0x100 (assuming that NEXT ADDRESS ≤ 0xFF: 8-bits)

o If JMPC is set, the 8 MBR bits are bitwise ORed with the 8 low-
order bits of the NEXT ADDRESS field coming from the current
microinstruction. The result is sent to MPC
o MPC is not loaded until the registers it depends on (MBR, N, and
Z) are ready. High Bit
JAMN JAMZ
Output
0 0 NEXT_ADDRESS[8]
0 1 NEXT_ADDRESS[8] + Z
1 0 NEXT_ADDRESS[8] + N
1 1 NEXT_ADDRESS[8] + Z + N
Microinstruction Control: The Mic-1 (5)

The next address of the microinstruction depends on the Z bit stored on


the previous ALU operation. If the Z bit is 0, the next microinstruction
comes from 0x92. If the Z bit is 1, the next microinstruction comes from
0x192.
Figure 4-7. A microinstruction with JAMZ set to 1
has two potential successors.
Microinstruction Control: The Mic-1 (JMPC)
o The box with the label ‘‘O’’ does an OR of MBR with NEXT
ADDRESS if JMPC is 1 but just passes NEXT ADDRESS
through to MPC if JMPC is 0.
o When JMPC is 1, the low-order 8 bits of NEXT ADDRESS are
normally zero. The high-order bit can be 0 or 1, so the NEXT
ADDRESS value used with JMPC is normally 0x000 or 0x100.
o This gives us the equation:

MPC[i] = NEXT_ADDRESS[i] + (JMPC · MBR[i])


The IJVM Memory Model (1)
o IJVM memory consists of an array of 4,294,967,296 bytes (4
GB), or an array of 1,073,741,824 words, each consisting of 4
bytes
Defined areas of memory
• The constant pool
• consists of constants, strings, and pointers to other areas of memory
• The Local variable frame
• storing local variables during the lifetime of the method invocation
• The operand stack
• allocated directly above the local variable frame
• The method area
• a region of memory containing the program
The IJVM Memory Model (2)

The method area is


CPP, LV,and SP registers treated as a byte
are all pointers to words, not LV, LV + 1, and LV + array
bytes 2 refer to the first
three words

constants, strings,
PC = PC+1 results in a
and pointers to
fetch of the next byte
other areas of
(address of next
memory (name of
instruction).
functions)
stores variables
during the lifetime
of the invocation

resister that contains the contains the address of the


address of the first word of instruction to be fetched next
the constant pool
Figure 4-10. The various parts of the IJVM memory.
Stacks (1)
LV points to the base of the stack frame for
the currently active procedure

SP points to the highest


word of A’s local
variables

Base of the local A calls B B calls C C and B return and A calls D


variables

Local variables do not have absolute memory addresses. Variables are referred
to by giving their offset (distance) from LV
Figure 4-8. Use of a stack for storing local variables. (a) While A
is active. (b) After A calls B. (c) After B calls C. (d) After C and B
return and A calls D (A,B,C,D are considered as procedures)
Stacks (2)
Suppose, for example, that before calling B, A has to do the
computation:

a1 = a2 + a3;
Pop two words off the stack to
execute the instruction
Result pushed
back onto the
stack
1012

Figure 4-9. Use of an operand stack for doing


an arithmetic computation
The IJVM Instruction Set (1)

(from the instruction in the method


area)
(Duplicate)
(Conditional Branch)

(from local variable frame)

All the branch instructions, if taken, adjust the value of PC by the size of their (16-bit signed)
offset. This offset is added to the address of the opcode.

Figure 4-11. The IJVM instruction set. The operands


byte, const, and varnum are 1 byte. The operands
disp, index, and offset are 2 bytes.
The IJVM Instruction Set (2)

Figure 4-11. The IJVM instruction set. The operands


byte, const, and varnum are 1 byte. The operands
disp, index, and offset are 2 bytes.
IJVM Assembly Mnemonics
o IAND
–Pops two values off the stack
–Performs logical AND of the values
–Pushes the result onto the stack
• IOR
–Pops two values off the stack
–Performs logical OR of the values
–Pushes the result onto the stack
• NOP
–No operation; used as a “spacer” or “delay”
• POP
–Deletes the word on the top of the stack
• SWAP
–Exchanges the top two words on the stack
Compiling Java to IJVM (1)
Java Assembly program Binary program (in hexa)
code produced by the IJVM translated by the Java
Java Assembler
compile
r

If they are equal, a branch is


taken to L1, where k is set
to 0. Otherwise, Continue

It is assumed that i is local variable 1 (0x01), j is local variable 2 (0x02), and k is local variable
3 (0x03).

Figure 4-14. (a) A Java fragment. (b) The corresponding Java


assembly language. (c) The IJVM program in hexadecimal.
Compiling Java to IJVM (2)

Result is stored in i, IF ICMPEQ pops


so the stack is the tops two words
empty again and compare them

Figure 4-15. The stack after each instruction of Fig. 4-14(b).


Beginning of the

The IJVM Instruction Set (3) stack for the new


method. Set PC
to point to the
fifth byte in the
method code
space
Address of old
PC and old LV of
the caller method

A reference (pointer)
to the object to be Address of old
called PC and old LV of
the caller method
if exists

Figure 4-12. (a) Memory before executing INVOKEVIRTUAL.


(b) After executing it.
INVOKEVIRTUAL
o INVOKEVIRTUAL methodname
 Calls a method
 Load the OBJREF constant (located in the constant pool)
on the stack before the call
 Load any parameters that need to be passed to the method
 Finally, execute the INVOKEVIRTUAL instruction

o Remember that methodname is the disp operand (2


bytes)
o Added to the CPP to retrieve the PC for the method
being invoked
INVOKEVIRTUAL (cont’d)
Method code for methodname
o Start at the PC retrieved from Constant Pool
o First 4 bytes contain special data
o 2 bytes form a 16-bit integer for the number of parameters including
OBJREF (parameter 0)
o 2 bytes form a 16-bit integer indicating the size of the local variable area
for the invoked method
o Fifth byte contains the first opcode to be executed
o The last operation needed to carry out INVOKEVIRTUAL is to
set PC to point to the fifth byte in the method code space
Executing INVOKEVIRTUAL
• OBJREF is overwritten to store the address of old PC
• Old LV is stored immediately above
• SP points to the top word on the “empty” stack

Figure 4-12. (a) Memory before executing INVOKEVIRTUAL.


(b) After executing it.
IRETURN
• Used to return from a method
• The value placed on the top of stack (TOS) is the value returned
• Only TOS is returned, nothing below
• Stack is in same state as before call except for return value on
TOS

Executing IRETURN
• Deallocates space used by returning method
• Restores stack to former state except
–OBJREF (Link ptr) and parameters are popped off the stack
–Returned value placed on top of stack
The IJVM Instruction Set (4)

Figure 4-13. (a) Memory before executing IRETURN.


(b) After executing it.
Ex: Exercise – IJVM Code Size
Ex: Exercise – IJVM Code Size
Revision Exercises: Ex 1
Give two different IJVM translations for the following Java
statement:
i = k + n + 5;
Revision Exercises: Ex 1
Give two different IJVM translations for the following Java
statement:
i = k + n + 5;

IJVM translations for given Java statement:


Revision Exercises: Ex 2
Give the Java statement that produced the following IJVM
code:
ILOAD j
ILOAD n JAVA statement for the code:
ISUB
i = (j-n-7)+(j-n-7)
BIPUSH 7
ISUB
DUP
IADD
ISTORE i
Revision Exercises: Ex 3
Give an IJVM translation for the following Java statement:
for (int i = 3; i = = 0; i--)

IJVM translation for the given Java


statement:

BIPUSH 3
ISTORE i
L1: ILOAD i
IFEQ L2
IINC i -1
GOTO L1
L2: HALT
Ex: Code Segment using Standard Output
Ex: Code Segment using Standard Output
Ex: Constants
Syntax:
.constant
constant1 value1
constant2 value2
.end-constant

Notes: Global constants are declared in the .constant section at the beginning of the file.
The value of the constant can be given as a hexadecimal number (must be prefixed with
"0x"), an octal number (must be prefixed with "0"), or a decimal number (no prefix).
Declared constants may then be referred to by name, or by an instruction expecting a
constant as a parameter (i.e. LDC_W constant_name ). For example:
Ex: Constants
http://www.ontko.com/mic1/jas.html
.constant
one 1
start 32 // this program displays all the printable ASCII
stop 126 values 32..126
.end-constant

.main
LDC_W start
next: DUP
OUT // output the current character
DUP
LDC_W stop
ISUB
IFEQ done // exit if we've reached the end
LDC_W one
IADD
GOTO next // increment and do the next one
done: POP
HALT
.end-main
Ex: ADDING AN INSTRUCTION TO THE IJVM ISA BY
MODIFYING THE MIC-1 MICROPROGRAM.
(If you understand the problem statement, you’re halfway there.)

In this exercise, we will extend the capabilities of the Mic-1 machine by modifying the
definition of its control store to include a new instruction. Then we will add the details for
this instruction into the definition file for the ijvmasm assembler. Finally we’ll write some
.jas code that exercises the new instruction, and run it in the simulator to make sure it
works.

The instruction we will add is COM, for complement. The instruction will calculate the
one’s complement of the word on the stack and push it back on the stack.

The microcode for this is fairly straightforward. The point of this exercise is to walk
through all the steps, to set you up for doing some more interesting microprograms.

Here’s the code for COM:

com1 MDR = TOS = NOT TOS // calc 1s compl, save in TOS and MDR
com2 MAR = SP = SP+1; wr; goto Main1 // set MAR from SP, write, recycle
Revision Exercises
How long does a 200 MHz Mic-1 take to execute the Java
statement i=j+k;? Give your answer in nanoseconds.
JAVA Code Micro-instructions
ILOAD j main1 + iload(5) =6
ILOAD k main1 + iload(5) =6
IADD main1 + iadd(3) =4
ISTORE i main1 + istore(6) =7

6+6+4+7 = 23 microinstructions. Since at 200MHz, each instruction


takes 5 nsec. 23*5 = 115 nsec is required to execute this statement.
Revision Exercises
Write a program to read in a two digit number, and sum
the even numbers from 2 to that number.
Extra credit if you can print out the answer
End

Chapter 4

You might also like