PowerPC™ 601 RISC Microprocessor Technical Summary
PowerPC™ 601 RISC Microprocessor Technical Summary
MPR601TSU-02
(IBM Order Number)
MPC601/D
(Motorola Order Number)
11/93
REV 1
O
IC
EM
R,
O
CT
U
ND
C
IN
Advance Information
LE
A
PowerPC 601 RISCEMicroprocessor
SC
E
Technical Summary
FR
Y
B
Part 2, Levels of the PowerPC Architecture, describes the three levels of the
PowerPC architecture.
In this document, the terms PowerPC 601 RISC Microprocessor and 601 are used to
denote the first microprocessor from the PowerPC architecture family. The PowerPC 601
microprocessors are available from IBM as PPC601 and from Motorola as MPC601.
601Technical Summary
TO
The 601 integrates three execution unitsan integer unit (IU), a branch
processing unit (BPU), and a
UC
D
floating-point unit (FPU). The ability to execute three instructionsNin parallel and the use of simple
O
instructions with rapid execution times yield high efficiency and C
I throughput for 601-based systems. Most
integer instructions execute in one clock cycle. The FPU is M
pipelined so a single-precision multiply-add
SE
instruction can be issued every clock cycle.
LE
A
The 601 includes an on-chip, 32-Kbyte, eight-wayCset-associative, physically addressed, unified instruction
and data cache and an on-chip memory management
ES unit (MMU). The MMU contains a 256-entry, two-way
E
set-associative, unified translation lookaside
FR buffer (UTLB) and provides support for demand paged virtual
memory address translation and variable-sized
block translation. Both the UTLB and the cache use least
Y
B
recently used (LRU) replacement
algorithms.
ED
V
I and a 32-bit address bus. The 601 interface protocol allows multiple masters
The 601 has a 64-bit dataHbus
C
to compete for system
resources through a central external arbiter. Additionally, on-chip snooping logic
AR
maintains cache coherency
in multiprocessor applications. The 601 supports single-beat and burst data
transfers for memory accesses; it also supports both memory-mapped I/O and I/O controller interface
addressing.
The 601 uses an advanced, 3.6-V CMOS process technology and maintains full interface compatibility with
TTL devices.
. block basis
Cache write-back or write-through operation programmable on a per page orC
per
IN
,
Memory unit with a two-element read queue and a three-element writeRqueue
TO
C
BPU that performs condition register (CR) look-ahead operations
DU
N
Address translation facilities for 4-Kbyte page size, variable
CO block size, and 256-Mbyte
I
segment size
EM
S
A 256-entry, two-way set-associative UTLB
LE
A
Four-entry BAT array providing 128-Kbyte
to 8-Mbyte blocks
SC
E
Four-entry, first-level ITLB E
FR by UTLB misses) through hashed page tables
Hardware table search (caused
Y
B
52-bit virtual address;
32-bit physical address
D
VEsystem performance
I
Facilities for enhanced
H
Cdefined
Bus speed
as selectable division of operating frequency
R
A
(INSTRUCTION FETCH)
RTC
INSTRUCTION UNIT
RTCU
INSTRUCTION
QUEUE
RTCL
8 WORDS
INSTRUCTION
IU
*
XER
BPU
/
GPR
FILE
1 WORD
ED
V
MMU
HI
C
UTLB
AR ITLB
R,
O
CT
U
ND FPU
INSTRUCTION
ISSUE LOGIC
C
IN
EE
R
F
CTR
CR LE
A
LR
O
IC
EM
+
FPR
FILE
SC
FPSCR
2 WORDS
DATA
ADDRESS
BY
32-KBYTE
CACHE
TAGS
(INSTRUCTION AND DA-
PHYSICAL ADDRESS
BAT
ARRAY
ADDRESS
DATA
MEMORY UNIT
READ
QUEUE
WRITE QUEUE
SNOOP
4 WORDS
DATA
8 WORDS
SNOOP
ADDRESS
ADDRESS
DATA
2 WORDS
SYSTEM INTERFACE
OR
EE
R
F
The IU executes all integer instructions and executes floating-point memory accesses N
inCconcert with the
I
, its arithmetic logic
FPU. The IU executes one integer instruction at a time, performing computationsR
with
O
unit (ALU), multiplier, divider, integer exception register (XER), and the general-purpose
register file. Most
T
C
integer instructions are single-cycle instructions.
U
EE
R
Load and store instructions are considered
F to have completed execution with respect to precise exceptions
Y
after the address is translated. If the
address
for a load or store instruction hits in the UTLB or BAT array
B
and it is aligned, the instruction execution
(that is, calculation of the address) takes one clock cycle, allowing
D
E
back-to-back issue of loadIV
and store instructions. The time required to perform the actual load or store
operation varies depending
CH on whether the operation involves the cache, system memory, or an I/O device.
R
A
1.4.3 Floating-Point Unit (FPU)
The FPU contains a single-precision multiply-add array, the floating-point status and control register
(FPSCR), and thirty-two 64-bit FPRs. The multiply-add array allows the 601 to efficiently implement
floating-point operations such as multiply, add, divide, and multiply-add. The FPU is pipelined so that most
single-precision instructions and many double-precision instructions can be issued back-to-back. The FPU
contains two additional instruction queues. These queues allow floating-point instructions to be issued from
the instruction queue even if the FPU is busy, making instructions available for issue to the other execution
units.
Like the BPU, the FPU can access instructions from the bottom half of the instruction queue (Q3Q0),
which permits floating-point instructions that do not depend on unexecuted instructions to be issued early
to the FPU.
The 601 supports all IEEE 754 floating-point data types (normalized, denormalized, NaN, zero, and infinity)
in hardware, eliminating the latency incurred by software exception routines.
After an address is generated, the upper order bits of the logical (effective) address are translated by the
MMU into physical address bits. Simultaneously, the lower order address bits (that are untranslated and
therefore considered both logical and physical), are directed to the on-chip cache where they form the index
into the eight-way set-associative tag array. After translating the address, the MMU passes the higher-order
bits of the physical address to the cache, and the cache lookup completes. For cache-inhibited accesses or
accesses that miss in the cache, the untranslated lower order address bits are concatenated with the translated
higher-order address bits; the resulting 32-bit physical address is then used by the memory unit and the
C.
system interface, which accesses external memory.
IN
,
The MMU also directs the address translation and enforces the protection hierarchy
OR programmed by the
T
operating system in relation to the supervisor/user privilege level of the access
Cand in relation to whether the
U
access is a load or store.
ND
O
For instruction accesses, the MMU first performs a lookup in the
ICfour entries of the ITLB for both blockM
and page-based physical address translation. Instruction accesses
E that miss in the ITLB and all data accesses
S
cause a lookup in the UTLB and BAT array for the physical address translation. In most cases, the physical
E
Lphysical
address translation resides in one of the TLBs and the
address bits are readily available to the onA
C
chip cache. In the case where the physical address
translation
misses
in the TLBs, the 601 automatically
ES
performs a search of the translation tables inEmemory
using the information in the table search description
R
register 1 (SDR1) and the correspondingFsegment register.
BY
Memory management in the 601Dis described in more detail in Section 3.6.2, PowerPC 601 Microprocessor
Memory Management. IVE
CH
R
A Unit
1.6 Cache
The PowerPC 601 microprocessor contains a 32-Kbyte, eight-way set associative, unified (instruction and
data) cache. The cache line size is 64 bytes, divided into two eight-word sectors, each of which can be
snooped, loaded, cast-out, or invalidated independently. The cache is designed to adhere to a write-back
policy, but the 601 allows control of cacheability, write policy, and memory coherency at the page and block
level. The cache uses a least recently used (LRU) replacement policy.
As shown in Figure 1, the cache provides an eight-word interface to the instruction fetcher and load/store
unit. The surrounding logic selects, organizes, and forwards the requested information to the requesting unit.
Write operations to the cache can be performed on a byte basis, and a complete read-modify-write operation
to the cache can occur in each cycle.
The instruction unit provides the cache with the address of the next instruction to be fetched. In the case of
a cache hit, the cache returns the instruction and as many of the instructions following it as can be placed in
the eight-word instruction queue up to the cache sector boundary. If the queue is empty, as many as eight
words (an entire sector) can be loaded into the queue in parallel.
The cache tag directory has one address port dedicated to instruction fetch and load/store accesses and one
dedicated to snooping transactions on the system interface. Therefore, snooping does not require additional
clock cycles unless a snoop hit that requires a cache status update occurs.
The 601s memory unit contains read and write queues that buffer operations between the external interface
and the cache. These operations are comprised of operations resulting from load and store instructions that
are cache misses and read and write operations required to maintain cache coherency, table search, and other
operations. The memory unit also handles address-only operations and cache-inhibited loads and stores. As
shown in Figure 2, the read queue contains two elements and the write queue contains three elements. Each
element of the write queue can contain as many as eight words (one sector) of data. One element of the write
queue, marked snoop in Figure 2, is dedicated to writing cache sectors to system memory after a modified
sector is hit by a snoop from another processor or snooping device on the system bus. The use of the write
queue guarantees a high priority operation that ensures a deterministic response behavior .when snooping
C
hits a modified sector.
IN
ADDRESS
(from cache)
READ
QUEUE
(to cache)
DATA QUEUE
(four word)
CH
R
A
ED
V
I
BY
EE
R
F
LE
A
SC
O
IC
EM
ADDRESS
R,
O
CTDATA
U
(from cache)
ND
WRITE QUEUE
SNOOP
DATA
SYSTEM INTERFACE
The other two elements in the write queue are used for store operations and writing back modified sectors
that have been deallocated by updating the queue; that is, when a cache location is full, the least-recently
used cache sector is deallocated by first being copied into the write queue and from there to system memory.
Note that snooping can occur after a sector has been pushed out into the write queue and before the data has
been written to system memory. Therefore, to maintain a coherent memory, the write queue elements are
compared to snooped addresses in the same way as the cache tags. If a snoop hits a write queue element, the
data is first stored in system memory before it can be loaded into the cache of the snooping bus master.
Coherency checking between the cache and the write queue prevents dependency conflicts. Single-beat
writes in the write queue are not snooped; coherency is ensured through the use of special cache operations
that accompany the single-beat write operation on the bus.
Execution of a load or store instruction is considered complete when the associated address translation
completes, guaranteeing that the instruction has completed to the point where it is known that it will not
generate an internal exception. However, after address translation is complete, a read or write operation can
still generate an external exception.
Load and store instructions are always issued and translated in program order with respect to other load and
store instructions. However, a load or store operation that hits in the cache can complete ahead of those that
miss in the cache; additionally, loads and stores that miss the cache can be reordered as they arbitrate for the
system bus.
C
IN
Because the cache on the 601 is an on-chip, write-back primary cache, the predominant
R, type of transaction
O
for most applications is burst-read memory operations, followed by burst-write
T memory operations, I/O
C
controller interface operations, and single-beat (noncacheable or write-through)
memory read and write
DU of the burst and single-beat
operations. Additionally, there can be address-only operations, variants
N
O
operations (global memory operations that are snooped, and atomic
IC memory operations, for example), and
address retry activity (for example, when a snooped read access
EM hits a modified line in the cache).
B
Access to the system interface is granted
through an external arbitration mechanism that allows devices to
D
E
compete for bus mastership. This
arbitration mechanism is flexible, allowing the 601 to be integrated into
IV
systems that implementHvarious
fairness and bus parking procedures to avoid arbitration overhead.
C
Additional multiprocessor
support
is provided through coherency mechanisms that provide snooping,
R
A
external control of the on-chip cache and TLB, and support for a secondary cache. Multiprocessor software
support is provided through the use of atomic memory operations.
Typically, memory accesses are weakly orderedsequences of operations, including load/store string and
multiple instructions, do not necessarily complete in the order they beginmaximizing the efficiency of the
bus without sacrificing coherency of the data. The 601 allows read operations to precede store operations
(except when a dependency exists, of course). In addition, the 601 can be configured to reorder high priority
write operations ahead of lower priority store operations. Because the processor can dynamically optimize
run-time ordering of load/store traffic, overall performance is improved.
PowerPC user instruction set architectureDefines the base user-level instruction set, user-level
registers, data types, floating-point exception model, memory models for a uniprocessor
environment, and programming model for uniprocessor environment.
S
Note that while the 601 is said to adhere to the PowerPCEarchitecture
at all three levels, it diverges in aspects
L
of its implementation to a greater extent than shouldAbe expected of subsequent PowerPC processors. Many
C provides compatibility with an existing architecture
of the differences result from the fact that the 601Sdesign
E
standard (POWER), while providing a reliable
RE platform for hardware and software development compatible
with subsequent PowerPC processors. F
BY
CH
10
FeaturesSection 3.1, Features, describes general features that the 601 shares with the PowerPC
family of microprocessors. It does not list PowerPC features not implemented in the 601.
Registers and programming modelSection 3.2, Registers and Programming Model, describes
the registers for the operating environment architecture common among PowerPC processors and
describes the programming model. It also describes differences in how the registers are used in the
601 and describes the additional registers that are unique to the 601.
Instruction set and addressing modesSection 3.3, Instruction Set and Addressing Modes,
describes the PowerPC instruction set and addressing modes for the PowerPC operating
environment architecture. It defines the PowerPC instructions implemented in the 601 as well as
additional instructions implemented in the 601 but not defined in the PowerPC architecture.
Cache implementationSection 3.4, Cache Implementation, describes the cache model that is
defined generally for PowerPC processors by the virtual environment architecture. It also provides
specific details about the 601 cache implementation.
Exception modelSection 3.5, Exception Model, describes the exception model of the PowerPC
operating environment architecture and the differences in the 601 exception model.
,I
3.1 Features
TO
C
U
ND
IC
The 601 is a high-performance, superscalar PowerPC implementation.
The PowerPC architecture allows
M
E
optimizing compilers to schedule instructions to maximize
performance
through efficient use of the
S
E
PowerPC instruction set and register model. The multiple,
independent
execution
units allow compilers to
L
A
maximize parallelism and instruction throughput.
Compilers
that
take
advantage
of the flexibility of the
C
S
PowerPC architecture can additionally optimize
E system performance of the PowerPC processors.
RE
F
The 601 implements the PowerPC architecture,
with the extensions and variances listed in Appendix H,
Implementation Summary for Programmers,
in the PowerPC 601 RISC Microprocessor Users Manual.
BY
ED
CH
R
A
11
The PowerPC architecture also defines 32 user-level 64-bit floating-point registers (FPRs).
FPRs serve
NCThe
I
as the data source or destination for floating-point instructions. These registers can contain
data
objects of
,
R
either single- or double-precision floating-point formats.
O
CT
U
ND
O
The CR is a 32-bit user-level register that consists of eight four-bit
IC fields that reflect the results of certain
operations, such as move, integer and floating-point compare,
M arithmetic, and logical instructions, and
SE
provide a mechanism for testing and branching.
LE
A
C
3.2.1.4 Floating-Point Status and S
Control
Register (FPSCR)
E
E (FPSCR) is a user-level register that contains all exception
The floating-point status and control register
FR enable bits, and rounding control bits needed for compliance
signal bits, exception summary bits, exception
BY
with the IEEE 754 standard.
D
VERegister (MSR)
I
3.2.1.5 Machine State
CH (MSR) is a supervisor-level register that defines the state of the processor. The
R
The machine state register
A
contents of this register is saved when an exception is taken and restored when the exception handling
completes. The 601 implements the MSR as a 32-bit register; 64-bit PowerPC processors implement a 64bit MSR.
12
Link register (LR)The link register can be used to provide the branch target address and to hold
the return address after branch and link instructions. The LR is 32 bits wide in 32-bit
implementations.
Count register (CTR)The CTR is decremented and tested automatically as a result of branch-andcount instructions. The CTR is 32 bits wide in 32-bit implementations.
Integer exception register (XER)The 32-bit XER contains the integer carry and overflow bits and
two fields for the Load String and Compare Byte Indexed (lscbx) instruction (a POWER instruction
implemented in the 601 but not defined by the PowerPC architecture).
C.
IN
,
3.2.1.9 Supervisor-Level SPRs
OR
T
The 601 also contains SPRs that can be accessed only by supervisor-level software.
These registers consist
C
U
of the following:
ND
O
The 32-bit data access exception (DAE)/source instruction
IC service register (DSISR) defines the
M
cause of data access and alignment exceptions.
SE
The data address register (DAR) is a 32-bit register
that holds the address of an access after an
E
L
A
alignment or data access exception.
C
Sdecrementing
E
Decrementer register (DEC) is a 32-bit
counter that provides a mechanism for
RE a programmable delay. PowerPC architecture defines that the
causing a decrementer exceptionFafter
DEC frequency be provided Y
a subdivision of the processor clock frequency; however, the 601
B as
implements a separate clock
input that serves both the DEC and the RTC facilities.
D
VEdescription register 1(SDR1) specifies the page table format used in logical The 32-bit table search
I
to-physical address
CH translation for pages.
R
A status save/restore register 0 (SRR0) is a 32-bit register that is used by the 601 for
The machine
saving the address of the instruction that caused the exception, and the address to return to when a
Return from Interrupt (rfi) instruction is executed.
The machine status save/restore register 1 (SRR1) is a 32-bit register used to save machine status
on exceptions and to restore machine status when an rfi instruction is executed.
General SPRs, SPRG0SPRG3, are 32-bit registers provided for operating system use.
The external access register (EAR) is a 32-bit register that controls access to the external control
facility through the External Control Input Word Indexed (eciwx) and External Control Output
Word Indexed (ecowx) instructions.
The processor version register (PVR) is a 32-bit, read-only register that identifies the version
(model) and revision level of the PowerPC processor.
Block address translation (BAT) registersThe PowerPC architecture defines 16 BAT registers,
divided into four pairs of data BATs (DBATs) and four pairs of instruction BATs (IBATs). The 601
includes four pairs of unified BATs (BAT0UBAT3U and BAT0LBAT3L). See Figure 3 for a list
of the SPR numbers for the BAT registers. Note that the format for the 601s implementation of the
BAT registers differs from the PowerPC architecture definition.
13
User-Level SPRs
USER PROGRAMMING
MODEL
SPR0
MQ Register 1
SPR1
FPR0
SPR4
FPR1
SPR5
SPR8
LRLink Register
SPR9
0
63
GPR1
SPR18
CR
SPR19
31
SPR20
GPR31
0
SPR21
Floating Point
Status and
Control
Register
31
FPSCR
0
BY
31
SPR274
SPRG2SPR General 2
SPR275
SPRG3SPR General 3
SPR282
SPR287
SPR528
IBAT0UBAT 0 Upper 2
Segment
Registers
SPR529
IBAT0LBAT 0 Lower 2
SPR530
IBAT1UBAT 1 Upper 2
SR0
SPR531
IBAT1LBAT 1 Lower 2
SR1
SPR532
IBAT2UBAT 2 Upper 2
SPR533
IBAT2LBAT 2 Lower 2
SPR534
IBAT3UBAT 3 Upper 2
SPR535
IBAT3LBAT 3 Lower 2
SPR1008
HID0 1
SPR1009
HID1 1
SPR1010
HID2 (IABR) 1
SPR1013
HID5 (DABR) 1
SR15
31
SPRG1SPR General 1
Machine State
Register 2
0
SPRG0SPR General 0
MSR
SPR22
TO
C
DARData Address
DURegister
N
RTCURTCOUpper Register (For writing only)1,3
IC Lower Register (For writing only)1,3
RTCLRTC
M
DECDecrementer
Register4
SE
SPR273
SUPERVISOR PROGRAMMING
H
MODEL
RC
31
31
SPR272
ED
V
I
.
NC
Supervisor-Level
R,SPRs
Condition
Register
GPR0
CTRCount Register
0
FPR31
31
601-only registers. These registers are not necessarily supported by other PowerPC processors.
registers may be implemented differently on other PowerPC processors. The PowerPC architecture defines two sets of
BAT registerseight IBATs and eight DBATs. The 601 implements the IBATs and treats them as unified BATs.
3 RTCU and RTCL registers can be written only in supervisor mode, in which case different SPR numbers are used.
4 DEC register can be read by user programs by specifying SPR6 in the mfspr instruction (for POWER compatibility).
2 These
Figure 3.
14
IN
Figure 3 shows all the 601 registers and includes the following registers that are R
not, part of the PowerPC
O
architecture:
T
Block-address translation
ED (BAT) registers. The 601 includes eight block-address translation
V
registers (BATs), consisting
of four pairs of BATs (IBAT0UIBAT3U and IBAT0LIBAT3L). See
HofI the SPR numbers for the BAT registers. Note that the PowerPC architecture has
Figure 3 for a list
C
twice as many
AR BAT registers as the 601.
Hardware implementation registers (HID0HID2, HID5, and HID15). These registers are provided
primarily for debugging. HID15 holds the four-bit processor identification tag (PID) that is useful
for differentiating processors in multiprocessor system designs. Note that while it is not guaranteed
that the implementation of HID registers is consistent among PowerPC processors, other processors
may be designed with similar or identical HID registers.
15
R,
O
Load/store instructionsThese include integer and floating-point load
CTand store instructions.
U
Integer load and store instructions
ND
O
Integer load and store multiple instructions
IC
M
Floating-point load and store
SE
E
Floating-point move instructions
AL
C
Primitives used to construct atomicSmemory operations (lwarx and stwcx. instructions)
EE branching instructions, condition register logical
Flow control instructionsTheseRinclude
F other instructions that affect the instruction flow.
instructions, trap instructions,
and
BY
Branch and trap instructions
ED
V
Condition register
HI logical instructions
C
Processor control
AR instructionsThese instructions are used for synchronizing memory accesses
and management of caches, UTLBs, and the segment registers.
Floating-point status and control instructions
C
IN
Memory control instructionsThese instructions provide control of caches, TLBs, and segment
registers.
Supervisor-level cache management instructions
User-level cache instructions
Segment register manipulation instructions
Translation lookaside buffer management instructions
Note that this grouping of the instructions does not indicate which execution unit executes a particular
instruction or group of instructions. This information, which is useful in taking full advantage of superscalar
parallel instruction execution, is provided in Chapter 7, Instruction Timing, and Chapter 10, Instruction
Set, in the PowerPC 601 RISC Microprocessor Users Manual.
Integer instructions operate on byte, half-word, and word operands. Floating-point instructions operate on
single-precision (one word) and double-precision (one double word) floating-point operands. The PowerPC
architecture uses instructions that are four bytes long and word-aligned. It provides for byte, half-word, and
16
C
IN
The effective address (EA) is the 32-bit address computed by the processor when
OR executing a memory
T
access or branch instruction or when fetching the next sequential instruction.C
DU
N
The PowerPC architecture supports two simple memory addressing modes:
O
IC with immediate index)
EA = (rA|0) + offset (including offset = 0) (register indirect
EM
S
EA = (rA|0) + rB (register indirect with index)E
AL generation for memory accesses. Calculation of the
These simple addressing modes allow efficient address
C
effective address for aligned transfers occurs E
inSa single clock cycle.
E
R
For a memory access instruction, if theFsum of the effective address and the operand length exceeds the
maximum effective address, the storage
BY operand is considered to wrap around from the maximum effective
D
address to effective address 0.E
IV
H
Effective address computations
for both data and instruction accesses use 32-bit unsigned binary arithmetic.
C
R
A carry from bit 0 A
is ignored in 32-bit implementations.
The 601 implements the 32-bit PowerPC architecture instructions except as indicated in
Appendix C, PowerPC Instructions Not Implemented, in the PowerPC 601 RISC Microprocessor
Users Manual. Otherwise, all instructions not implemented in the 601 are defined as optional in the
PowerPC architecture.
The 601 supports a number of POWER instructions that are otherwise not implemented in the
PowerPC architecture. These are listed in Appendix B, POWER Architecture Cross Reference,
and individual instructions are described in Chapter 10, Instruction Set, in the PowerPC 601 RISC
Microprocessor Users Manual.
The 601 implements the External Control Input Word Indexed (eciwx) and External Control Output
Word Indexed (ecowx) instructions, which are optional in the PowerPC architecture definition.
Several of the instructions implemented in the 601 function somewhat differently than they are
defined in the PowerPC architecture. These differences typically stem from design differences; for
instance, the PowerPC architecture defines several cache control instructions specific to separate
instruction and data cache designs.
When executed on the 601, such instructions may provide a subset of the functions of the instruction
or they may be no-ops.
17
C
IN
The PowerPC architecture does not define hardware aspects of cache implementations.
R, For example, some
O
PowerPC processors may have separate instruction and data caches (HarvardTarchitecture), while others,
such as the 601, implement a unified cache.
UC
ND
Write-back/write-through mode
Cache-inhibited mode
Memory coherency
LE
A
SC
I
M
E
The cache is
physically addressed and can operate in either write-back or write-through mode as specified by the
PowerPC architecture.
The cache is configured as eight sets of 64 lines. Each line consists of two sectors, four state bits (two per
sector), several replacement control bits, and an address tag. The two state bits implement the four-state
MESI (modified-exclusive-shared-invalid) protocol. Each sector contains eight 32-bit words. Note that the
PowerPC architecture defines the term block as the cacheable unit. For the 601 processor, the block is a
sector. A block diagram of the cache organization is shown in Figure 4.
Each cache line contains 16 contiguous words from memory that are loaded from a 16-word boundary (that
is, bits A26A31 of the logical addresses are zero); thus, a cache line never crosses a page boundary.
Misaligned accesses across a page boundary can incur a performance penalty.
Cache reload operations are always performed on a sector basis (that is, the cache is snooped and updated
and coherency is maintained on a per-sector basis). However, if the other sector in the line is marked invalid,
an optional, low-priority update of that sector is attempted after the sector that contained the critical word
is filled. The ability to attempt the other sector update can be disabled by the system software.
External bus transactions that load instructions or data into the cache always transfer the missed quad word
first, regardless of its location in a cache sector; then the rest of the cache sector is filled. As the missed quad
word is loaded into the cache, it is simultaneously forwarded to the appropriate execution unit so instruction
execution resumes as quickly as possible.
18
To ensure coherency among caches in a multiprocessor (or multiple caching-device) implementation, the
601 implements the MESI protocol. MESI stands for modified/exclusive/shared/invalid. These four states
indicate the state of the cache block as follows:
ModifiedThe cache block is modified with respect to system memory; that is, data for this address
is valid only in the cache and not in system memory.
ExclusiveThis cache block holds valid data that is identical to the data at this address in system
memory. No other cache has this data.
SharedThis cache block holds valid data that is identical to this address in system memory and at
least one other caching device.
C
IN
Cache coherency is enforced by on-chip hardware bus snooping logic. Since the cache
tag directory has a
OR
separate port dedicated to snooping bus transactions, bus snooping traffic doesTnot interfere with processor
C
access to the cache unless a snoop hit occurs.
DU
8 SETS
CH
R
A
ED
V
I
BY
EE
R
F
LE
A
SC
N
O
IC
M
E
SECTOR 0
SECTOR 1
8 WORDS
8 WORDS
16 WORDS
Figure 4.
19
IN
LE
A
Unless a catastrophic condition causes a system C
reset or machine check exception, only one exception is
S
handled at a time. If, for example, a single E
instruction
encounters multiple exception conditions, those
RE the exception handler handles an exception, the instruction
conditions are encountered sequentially.FAfter
execution continues until the next exception
is encountered. However, in many cases there is no
BY This condition
attempt to re-execute the instruction.
method of recognizing and handling exception conditions
D
E
sequentially guarantees thatIV
exceptions
are recoverable.
H
Exception handlers should
RC save the information stored in SRR0 and SRR1 early to prevent the program state
A
from being lost due to a system reset and machine check exception or to an instruction-caused exception in
the exception handler, and before enabling external interrupts.
The PowerPC architecture supports four types of exceptions:
20
NC
I
R,
O
Table 1. PowerPC 601 Microprocessor Exception Classifications
CT
U
Synchronous/Asynchronous
Precise/Imprecise
ND Exception Type
O
IC check
Asynchronous
Imprecise
Machine
M
System reset
SE
E
Asynchronous
Precise
AL External interrupt
C
Decrementer
S
E
E
Synchronous
Precise R
Instruction-caused exceptions
F
BY
D
Although exceptions have other
characteristics as well, such as whether they are maskable or nonmaskable,
VE 1 define categories of exceptions that the 601 handles uniquely. Note that
I
the distinctions shown in Table
H
Table 1 includes noRC
synchronous imprecise instructions. While the PowerPC architecture supports
imprecise handlingAof floating-point exceptions, the 601 implements these exception modes as precise
exceptions.
The 601s exceptions, and conditions that cause them, are listed in Table 2. Exceptions that are specific to
the 601 are indicated.
Table 2. Exceptions and Conditions
Exception
Type
Vector Offset
(hex)
Causing Conditions
Reserved
00000
System reset
00100
Machine check
00200
A machine check is caused by the assertion of the TEA signal during a data bus
transaction.
21
Exception
Type
Vector Offset
(hex)
Data access
00300
Instruction
access
00400
Causing Conditions
The cause of a data access exception can be determined by the bit settings in
the DSISR, listed as follows:
1 Set if the translation of an attempted access is not found in the primary
hash table entry group (HTEG), or in the rehashed secondary HTEG, or in
the range of a BAT register; otherwise cleared.
4 Set if a memory access is not permitted by the page or BAT protection
mechanism described in Chapter 6, Memory Management Unit, in the
PowerPC 601 RISC Microprocessor Users Manual.; otherwise
C. cleared.
N
5 Set if the access was to an I/O segment (SR[T] =1) byI an eciwx, ecowx,
lwarx, stwcx., or lscbx instruction; otherwise cleared.
R, Set by an eciwx or
O
ecowx instruction if the access is to an address
that
is marked as writeT
C
through.
DUa load operation.
6 Set for a store operation and cleared for
N
9 Set if an EA matches the addressOin the DABR while in one of the three
IC
compare modes.
M and EAR[E] is cleared.
11 Set if eciwx or ecowx isE
used
An instruction access L
A exception is caused when an instruction fetch cannot be
performed for anyCof the following reasons:
The effective
ES(logical) address cannot be translated. That is, there is a page
E
fault for
this portion of the translation, so an instruction access exception
FR
must
be taken to retrieve the translation from a storage device such as a
Y disk drive.
Bhard
D
The fetch access is to an I/O segment.
VE The fetch access violates memory protection. If the key bits (Ks and Ku) in
I
H
the segment register and the PP bits in the PTE or BAT are set to prohibit
RC
read access, instructions cannot be fetched from this location.
A
External
interrupt
00500
Alignment
00600
22
Exception
Type
Vector Offset
(hex)
Causing Conditions
Program
00700
Floating-point
unavailable
00800
Decrementer
00900
The decrementer exception occurs when the most significant bit of the
decrementer (DEC) register transitions from 0 to 1. Must also be enabled with
the MSR[EE] bit.
I/O error
00A00
Reserved
00B00
System call
00C00
A system call exception occurs when a System Call (sc) instruction is executed.
Reserved
00D00
Other PowerPC processors may use this vector for trace exceptions.
Reserved
00E00
The 601 does not generate an interrupt to this vector. Other PowerPC
processors may use this vector for floating-point assist exceptions.
Reserved
00E1000FFF
CH
R
A
23
Exception
Type
Vector Offset
(hex)
Causing Conditions
Reserved
0100001FFF
Reserved, implementation-specific
Run mode
exception
02000
The run mode exception is taken depending on the settings of the HID1 register
and the MSR[SE] bit.
The following modes correspond with bit settings in the HID1 register:
Normal run modeNo address breakpoints are specified, and the 601
executes from zero to three instructions per cycle
. a time. The
Single instruction step modeOne instruction is processed
at
C
N
appropriate break action is taken after an instruction isIexecuted and the
,
processor quiesces.
ORat full speed (in parallel)
Limited instruction address compareThe 601Truns
until the EA of the instruction being decodedCmatches the EA contained in
U
HID2. Addresses for branch instructionsDand floating-point instructions may
N
never be detected.
O
Full instruction address compare
ICmodeProcessing proceeds out of IQ0.
When the EA in HID2 matches
EM the EA of the instruction in IQ0, the
S
appropriate break action is performed. Unlike the limited instruction address
E
compare mode, allL
instructions pass through the IQ0 in this mode. That is,
A
instructions cannot
be folded out of the instruction stream.
SC is taken when the MSR[SE] bit is set.
The followingE
mode
E trace modeNote that in other PowerPC implementations, the
MSR[SE]
FR
trace
exception is a separate exception with its own vector x'00D00'.
BY
D
VE
I
3.6 Memory
Management
CH
R
The following subsections
describe the PowerPC memory management architecture, and the specific 601
A
implementation, respectively.
24
The 601 MMU provides 4 Gbytes of logical address space accessible to supervisor and user programs with
a 4-Kbyte page size and 256-Mbyte segment size. Block sizes range from 128 Kbyte to 8 Mbyte and are
software selectable. In addition, the 601 uses an interim 52-bit virtual address and hashed page tables in the
generation of 32-bit physical addresses.
A UTLB provides address translation in parallel with the on-chip cache access, incurring no additional time
penalty in the event of a UTLB hit. The UTLB is a cache of the most recently used page
C. table entries.
N
I 601s UTLB is a
Software is responsible for maintaining the consistency of the UTLB with memory., The
R
256-entry, two-way set-associative cache that contains instruction and data address
O translations. The 601
provides hardware table search capability through the hashed page table C
onT UTLB misses. Supervisor
software can invalidate UTLB entries selectively. In addition, UTLB control
DU instructions can optionally be
N
broadcast on the external interface for remote invalidations.
O
IC
SC
C the exception processing mechanism for the implementation of the paged virtual
The 601 MMU reliesRon
A
memory environment and for enforcing protection of designated memory areas. Exception processing is
described in Chapter 5, Exceptions, in the PowerPC 601 RISC Microprocessor Users Manual. In
addition, the MSR of the 601 controls some of the critical functionality of the MMU.
As specified by the PowerPC architecture, the hashed page table is a variable-sized data structure that
defines the mapping between virtual page numbers and physical page numbers. The page table size is a
power of 2, and its starting address is a multiple of its size.
Also as specified by the PowerPC architecture, the page table contains a number of PTEGs. A PTEG
contains eight page table entries (PTEs) of eight bytes each; therefore, each PTEG is 64 bytes long. PTEG
addresses are entry points for table search operations.
25
C PowerPC 601
Each of the stages shown in Figure 5 is described in Chapter 7, Instruction Timing, Iin
Nthe
RISC Microprocessor Users Manual.
R,
TO
As shown in Figure 5, integer instructions are dispatched only from IQ0C(where they are also usually
DU from any of the bottom four
decoded); whereas branch and floating-point instructions can be dispatched
N
elements in the instruction queue (IQ0IQ3). The dispatch of integerOinstructions is restricted in this manner
IC which in turn provides a mechanism
to provide an ordered flow of instructions through the integer pipeline,
M
that ensures that all instructions appear to complete in order.
SEAs branch and floating-point instructions are
E
dispatched their position in the instruction stream is recorded
by means of tags that accompany the previous
L
Athat
integer instruction through the integer pipeline. Note
when a floating-point or branch instruction cannot
C
S
be tagged to an integer instruction, it is tagged
EEto a no-op, or bubble, in the integer pipeline.
F
Logic associated with the integer completion
(IC) stage reconstructs the program order, checks for data
Y
B
dependencies, and schedules the write-back stages of the three pipelines. Note that it is not necessary that
EDbe serialized if there are data dependencies. For example, instructions that
the write-back stages need only
V
update the condition register
HI (CR) must perform write-back in strict order.
RC
A
mechanism
The tagging
is described in Chapter 7, Instruction timing, in the PowerPC 601 RISC
Microprocessor Users Manual.
To minimize latencies due to data dependencies, the IU provides feed-forwarding. For example, if an integer
instruction requires data that is the result of the execution of the previous instruction, that data is made
available to the IU at the same time that the previous instructions write-back stage updates the GPR. This
eliminates an additional clock cycle that would have been necessary if the IU had to access the GPR. Feedforwarding is available between IU execute and decode stage and IU write-back and decode stage.
26
FA
Fetch Arbitration
CARB
CACC
LE
A
SC
Dispatch Unit EE
(Instructions inRthe IQ
F in the
are said to be
dispatch
BYstage (DS))
BE
MR
CH
R
A
O
IC
EM
SIQ7
R,
O
CT
U
ND
IQ6
C
IN
ISB
FPSB
Data Access
Queueing Unit
IQ5
IQ4
ED
V
I
F1
IQ3
IQ2
FD
IQ1
IQ0
ID
FPM
1
FPA
IE
FWA
FWL
Floating-Point
Unit (FPU)
BW
IC
IWA
IWL
= Cycle Boundary
Branch Processing
Unit (BPU)
1
An integer instruction can be passed to the ID stage in the same cycle in which it enters IQ0.
27
The floating-point pipeline has more stages than the IU pipeline, as shown in Figure 5. The 601 supports
both single- and double-precision floating-point operations, but double-precision instructions generally take
longer to execute, typically by requiring two cycles in the FD, FPM, and FPA stages. However, many of
these instructions, such as the double-precision floating-point multiply (fmul) and double-precision
floating-point accumulate instructions (fmadd, fmsub, fnmadd, and fnmsub), allow stages to overlap. For
example, when the second cycle of the FD stage begins, the first stage of FPM begins. Similarly
the FPM
.
stage overlaps with the FPA stage, allowing these instructions to complete these stages N
inC
four clock cycles
,I
instead of six.
OR
T
Because the PowerPC architecture can be applied to such a wide variety ofCimplementations, instruction
timing among various PowerPC processors varies accordingly.
DU
N
O
IC
M
3.8 System Interface
SE
The system interface is specific for each PowerPC processor
LE implementation.
A
SC
The 601 provides a versatile system interface that
allows for a wide range of implementations. The interface
E
E
includes a 32-bit address bus, a 64-bit data
bus,
and
52 control and information signals (see Figure 6). The
R
F
system interface allows for address-only
transactions as well as address and data transactions. The 601
BY the address arbitration, address start, address transfer, transfer
control and information signals include
D arbitration, data transfer, data termination, and processor state signals.
Edata
attribute, address termination,
V
Test and control signals provide
diagnostics for selected internal circuitry.
HI
C
AR
ADDRESS
DATA
ADDRESS ARBITRATION
DATA ARBITRATION
ADDRESS START
ADDRESS TRANSFER
TRANSFER ATTRIBUTE
DATA TRANSFER
601
Processor
ADDRESS TERMINATION
DATA TERMINATION
PROCESSOR STATE
TEST AND CONTROL
CLOCKS
+3.6 V
The system interface supports bus pipelining, which allows the address tenure of one transaction to overlap
the data tenure of another. The extent of the pipelining depends on external arbitration and control circuitry.
Similarly, the 601 supports split-bus transactions for systems with multiple potential bus mastersone
device can have mastership of the address bus while another has mastership of the data bus. Allowing
multiple bus transactions to occur simultaneously increases the available bus bandwidth for other activity
and as a result, improves performance.
28
C
IN
Memory accesses allow transfer sizes of 8, 16, 24, 32, 40, 48, 56, or 64 bits in one
R,bus clock cycle. Data
O
transfers occur in either single-beat transactions or four-beat burst transactions.
A single-beat transaction
CT
transfers as much as 64 bits. Single-beat transactions are caused by noncached
accesses that access memory
U
directly (that is, reads and writes when caching is disabled, cache-inhibited
ND accesses, and stores in writeO
through mode). Burst transactions, which always transfer an entire
IC cache sector (32 bytes), are initiated
M
when a sector in the cache is read from or written to memory.
E Additionally, the 601 supports address-only
STLBs
transactions used to invalidate entries in other processors
and caches.
E
AL
C
S
3.8.2 I/O Controller InterfaceEOperations
E
Both memory and I/O accesses can use the
FRsame bus transfer protocols. The 601 also has the ability to define
memory areas as I/O controller interface
BY areas. Accesses to the I/O controller interface redefine the function
of some of the address transfer and
D
E transfer attribute signals and add control to facilitate transfers between
V
the 601 and specific I/O devices
that respond to this protocol. I/O controller interface transactions provide
HI for variably-sized data transfers (1 to 128 bytes) and support a split request/
multiple transaction operations
C
R distinction between the two types of transfers is made with separate signalsTS for
response protocol. A
The
memory-mapped accesses and XATS for I/O controller interface accesses. Refer to Chapter 9, System
Interface Operation, in the PowerPC 601 RISC Microprocessor Users Manual for more information.
Address arbitration signalsThe 601 uses these signals to arbitrate for address bus mastership.
Address transfer start signalsThese signals indicate that a bus master has begun a transaction on
the address bus.
Address transfer signalsThese signals, which consist of the address bus, address parity, and
address parity error signals, are used to transfer the address and to ensure the integrity of the
transfer.
Transfer attribute signalsThese signals provide information about the type of transfer, such as the
transfer size and whether the transaction is bursted, write-through, or cache-inhibited.
Address transfer termination signalsThese signals are used to acknowledge the end of the address
phase of the transaction. They also indicate whether a condition exists that requires the address
phase to be repeated.
Data arbitration signalsThe 601 uses these signals to arbitrate for data bus mastership.
Data transfer signalsThese signals, which consist of the data bus, data parity, and data parity error
signals, are used to transfer the data and to ensure the integrity of the transfer.
29
Data transfer termination signalsData termination signals are required after each data beat in a
data transfer. In a single-beat transaction, the data termination signals also indicate the end of the
tenure, while in burst accesses, the data termination signals apply to individual beats and indicate
the end of the tenure only after the final data beat. They also indicate whether a condition exists
that requires the data phase to be repeated.
System status signalsThese signals include the interrupt signal, checkstop signals, and both softand hard-reset signals. These signals are used to interrupt and, under various conditions, to reset the
processor.
Processor state signalsThese two signals are used to set the reservation coherency bit and set the
size of the 601s output buffers.
IC
E
A bar over a signal nameEindicates
that the signal is active lowfor
R
retry)
and
TS (transfer start). Active-low
example, ARTRY (address
F
Y as asserted (active) when they are low and negated
signals are referredBto
when they are
EDhigh. Signals that are not active-low, such as AP0AP3
V
(address bus
I parity signals) and TT0TT4 (transfer type signals) are
referred
to as asserted when they are high and negated when they are low.
CH
AR
3.8.4 Signal Configuration
Figure 7 illustrates the 601 microprocessor's logical pin configuration, showing how the signals are grouped.
30
ADDRESS
ARBITRATION
ADDRESS
TRANSFER
START
TS
XATS
1
1
A0A31
4
1
TT4
TT0TT3
TC0TC1
TSIZ0TSIZ2
TBST
CI
WT
GBL
CSE0CSE2
HP_SNP_REQ
AACK
ARTRY
SHD
ADDRESS
TERMINATION
CLOCKS
32
AP0AP3
APE
TRANSFER
ATTRIBUTE
CH
R
A
BY
2X_PCLK
D
E
PCLK_EN
IVBCLK_EN
RTC
64
8
1
1
4
2
3
1
1
1
1
3
1
EE
R
F
DH0DH31, DL0DL31
DP0DP7
DPE
TA
DRTRY
TEA
1
1
1
601
ADDRESS
TRANSFER
DBG
DBWO
DBB
1
1
1
1
1
1
LE
A
SC
1
1
1
1
O
1
IC
1 M
S1E
1
1
1
1
1
1
1
21
1
1
1
,I
R
TO
C
U
ND
INT
CKSTP_IN
CKSTP_OUT
HRESET
SRESET
RSRV
SC_DRIVE
ESP INTERFACE
TEST INTERFACE
SYS_QUIESC
RESUME
QUIESC_REQ
DATA
ARBITRATION
DATA
TRANSFER
. TERMINATION
NC
DATA
SYSTEM
STATUS
ESP SCAN
INTERFACE
TEST
SIGNALS
59 59
+3.6 V
31
O
IC
EM
R,
O
CT
U
ND
C
IN
S
E
L
Information in this document is provided solely to enable system and software
A implementers to use PowerPC microprocessors. There are no express or implied
copyright licenses granted hereunder to design or fabricate PowerPC C
integrated circuits or integrated circuits based on the information in this document.
ESof IBM and of Motorola. However, neither party assumes any responsibility or liability as to
The PowerPC 601 microprocessor embodies the intellectual property
E
any aspects of the performance, operation, or other attributes
the microprocessor as marketed by the other party. Neither party is to be considered an agent or
FanyRofright
representative of the other party, and neither has granted
or authority to the other to assume or create any express or implied obligations on its behalf.
Y
Information such as errata sheets and data sheets,B
as well as sales terms and conditions such as prices, schedules, and support, for the microprocessor may vary
as between IBM and Motorola. Accordingly, customers
ED wishing to learn more information about the products as marketed by a given party should contact that party.
V
Both IBM and Motorola reserve the right
modify this manual and/or any of the products as described herein without further notice. Nothing in this manual, nor
HIandtoother
in any of the errata sheets, data sheets,
supporting documentation, shall be interpreted as conveying an express or implied warranty, representation, or
C
guarantee regarding the suitability
of the products for any particular purpose. The parties do not assume any liability or obligation for damages of any kind arising
R
out of the application or useA
of these materials. Any warranty or other obligations as to the products described herein shall be undertaken solely by the marketing
party to the customer, under a separate sale agreement between the marketing party and the customer. In the absence of such an agreement, no liability is
assumed by the marketing party for any damages, actual or otherwise.
Typical parameters can and do vary in different applications. All operating parameters, including Typicals, must be validated for each customer application by
customers technical experts. Neither IBM nor Motorola convey any license under their respective intellectual property rights nor the rights of others. The products
described in this manual are not designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other
applications intended to support or sustain life, or for any other application in which the failure of the product could create a situation where personal injury or death
may occur. Should customer purchase or use the products for any such unintended or unauthorized application, customer shall indemnify and hold IBM and
Motorola and their respective officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and
reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if
such claim alleges that Motorola or IBM was negligent regarding the design or manufacture of the part.
Motorola and
are registered trademarks of Motorola, Inc. Motorola, Inc. is an Equal Opportunity/Affirmative Action Employer.