A Study of The Alpha 21364 Processor Arul Prakash CS6810: Advanced Computer Architecture
A Study of The Alpha 21364 Processor Arul Prakash CS6810: Advanced Computer Architecture
Arul Prakash
CS6810: Advanced Computer Architecture
Table of Contents
1 Introduction .............................................................................................................3
2 History ....................................................................................................................3
3 21264 Instruction Set Architecture...........................................................................3
3.1 Registers ..........................................................................................................3
3.2 Data Types.......................................................................................................3
3.3 Addressing Modes ...........................................................................................4
3.4 Instruction format: ...........................................................................................4
3.5 Instruction Set:.................................................................................................5
4 Alpha 21364 micro-architecture...............................................................................5
4.1 21364 Pipeline .................................................................................................6
4.1.1 Fetch: .......................................................................................................6
4.1.2 Register Renaming ...................................................................................7
4.1.3 Instruction Issue .......................................................................................7
4.1.4 Register Read ...........................................................................................7
4.1.5 Execute ....................................................................................................7
4.1.6 Memory ...................................................................................................8
4.2 21364 Memory System: ...................................................................................8
4.3 Integrated Memory Controller:.........................................................................9
4.4 Integrated Network Interface............................................................................9
5 Memory Management..............................................................................................9
5.1 Address Tranlation...........................................................................................9
6 I/O System.............................................................................................................10
7 Analysis.................................................................................................................10
7.1 Strengths........................................................................................................10
7.2 Weaknesses....................................................................................................12
8 Conclusion ............................................................................................................12
9 References .............................................................................................................12
around 500,000 alpha based servers were sold till
1 Introduction 2000. Due to the low market demand, the prices
My interest in the Alpha processor was sparked of Alpha based systems have been high resulting
when I came across the following sentence: in a cycle.
“Where Intelx86 is CISC to the extreme, Alpha
is RISC to the extreme.” DEC was acquired by Compaq in 1998, which
Curious on what ‘extreme RISC’ meant, I started was later purchased HP in 2002. In October
collecting information on the Alpha 21364 2002, HP announced that in 2004 the line would
processor and present my findings in this paper. see its final upgrade to the faster EV79
I was not aware at that time that development on processor, and then advancements in the
the Alpha processor has ceased, but my feeling is platform would cease in favor of Intel Itanium-
that it is more because of the DEC’s lack of based servers. Ironically, even though the Alpha
enough infrastructure and marketing skills to processor is about to be phased out, 4 of alpha
compete with Intel than any architectural flaw. servers were ranked 2nd, 12th, 15th and 32nd in the
The paper is organized as follows. In Section 2, I latest Top500 list.
describe a brief history of the processor. Section
3 describes the ISA of the processor. Section 4
talks about the Alpha 21364 processor. Section 5 3 21264 Instruction Set
deals with the memory system and Section 6 Architecture
describes the I/O system. Finally in section 7, I
The Alpha 21264 ISA is 64 bit load-store RISC
describe the strengths and weaknesses of the
architecture. All instructions are 32 bits in
architecture.
length. Memory operations are either loads or
stores. All data manipulation is done between
registers.
2 History
Digital Equipment Corporation (DEC) had been 3.1 Registers
dominating the minicomputer marked during the
The Program Counter (PC) is a 64 bit register
80s with their PDP11 and VAX. By the late 90s
that keeps track of the next instruction to be
the VAX was becoming untenable and it was
executed. The PC uses bits <63, 2> with bits <1,
very difficult to implement using advanced
0> treated as RAZ/IGN.
pipelined or superscalar techniques. In an
There are 32 64 bit integer (R0, …R31) and
attempt to regain their dominance, they
floating point (F0, …F31) registers each. The
developed the MicroVAX in which they
registers R31 and F31 always contain 0 when
included a subset of the VAX instructions and
specified as a source operand.
data types. The first alpha processor (Alpha
There are 2 lock registers associated with the
21064) was introduced in 1992, and was the
load and store instructions, the lock flag and the
most powerful microprocessor available at that
locked_physical_addres register.
time. The alpha (RISC based) was a significant
In addition, the Alpha contains a PCC (Processor
departure from the VAX architecture (CISC
Cycle Counter), a few memory prefetch registers
based).
and a VAX compatibility register.
The alpha processors are named 21x64EVx. The
21 stands for the 21st century. The 64 indicates 3.2 Data Types
that it is 64 bit architecture. EV stands for The following are the Alpha architecture data
extended VAX (though it was significantly types:
different from the VAX). The two xs stand Byte: A byte is an 8 bit value. It is supported by
represent the generation of the processor and the the load, store, sign-extend, extract, mask, insert,
bus respectively. and zap instructions only.
Word: A word is 2 contiguous bytes starting on
Unfortunately, Alpha did not have the an arbitrary byte boundary. The word is only
infrastructure to compete with the production supported in Alpha by the load, store, sign-
cycles that Intel and later AMD had. Samsung, extend, extract, mask, and insert instructions.
the main Alpha licensee, did not have the Longword: A word is 4 contiguous bytes
resources to mass produce them. Hence, the starting on an arbitrary byte boundary. When
alpha processor never got very popular and only interpreted arithmetically, a longword is a two’s-
complement integer with bits of increasing IEEE T_floating
significance from 0 through 30. Bit 31 is the sign An IEEE double-precision, or T_floating, datum
bit. The longword is only supported in Alpha by occupies 8 contiguous bytes in memory starting
sign-extended load and store instructions and by on an arbitrary byte boundary. The form of a
longword arithmetic instructions. T_floating datum is sign magnitude with bit 63
Quadword: A quadword is 4 contiguous bytes the sign bit, bits <62:52> an excess-1023 binary
starting on an arbitrary byte boundary. When exponent, and bits <51:0> a 52-bit fraction.
interpreted arithmetically, a quadword is either a IEEE X_floating
two’s-complement integer with bits of increasing An IEEE extended-precision, or X_floating,
significance from 0 through 62 and bit 63 as the datum occupies 16 contiguous bytes in memory.
sign bit, or an unsigned integer with bits of The form of an X_floating datum is sign
increasing significance from 0 through 63. magnitude with bit 127 the sign bit, bits
<126:112> an excess–16383 binary exponent,
The floating point formats are: and bits <111:0> a 112-bit fraction.
VAX F_Floating: An F_floating datum is 4
contiguous bytes in memory starting on an There are a few more data types that are not
arbitrary byte boundary. The F_floating load directly supported by the hardware:
instruction reorders bits on the way in from • Octaword
memory, expands the exponent from 8 to 11 bits, • H_floating
and sets the low-order fraction bits to zero. This • D_floating
produces in the register an equivalent G_floating • Variable-Length Bit Field
number suitable for either F_floating or • Character String
G_floating operations. • Trailing Numeric String
VAX G_floating: A G_floating datum in The data types can either be little-endian byte
memory is 8 contiguous bytes starting on an addressing or big-endian addressing, though the
arbitrary byte boundary. The former is generally preferred.
form of a G_floating datum is sign magnitude
with bit 15 the sign bit, bits <14:4> an excess-
1024 binary exponent, and bits <3:0> and
<63:16> a normalized 53-bit fraction with the 3.3 Addressing Modes
redundant most significant fraction bit not The Alpha architecture provides only one
represented. Within the fraction, bits of addressing mode – displacement. Register
increasing significance are from 48 through 63, deferred is accomplished by using 0 in the base
32 through 47, 16 through 31, and 0 through 3. register.
The 11-bit exponent field encodes the values 0
through 2047. An exponent value of 0, together 3.4 Instruction format:
with a sign bit of 0, is taken to indicate that the There are five basic instruction formats –
G_floating datum has a value of 0. memory, branch, operate, floating point operate
VAX D_floating: A D_floating datum in and PAL code. All instruction formats are 32 bits
memory is 8 contiguous bytes starting on an long with a 6-bit major opcode filed in bits
arbitrary byte boundary. The memory form of a <31:26> of the instruction.
D_floating datum is identical to an F_floating Memory Instruction Format:
datum except for 32 additional low significance
fraction bits. Within the fraction, bits of
increasing significance are from 48 through 63,
32 through 47, 16 through 31, and 0 through 6.
IEEE S_floating: An IEEE single-precision, or
Ra is the destination/origin and Rb indicates an
S_floating, datum occupies 4 contiguous bytes in
address with a displacement offset indicated in
memory starting on an arbitrary byte boundary.
the displacement field.
The S_floating load instruction reorders bits on
Memory format instructions with a function code
the way in from memory, expanding the
replace the memory displacement field in the
exponent from 8 to 11 bits, and sets the low-
memory instruction format with a function code
order fraction bits to zero. This produces in the
that designates a set of miscellaneous
register an equivalent T_floating number,
suitable for either S_floating or T_floating
operations.
instructions as shown in the figure below: • Logical and shift
• Byte manipulation
• Floating point load and store
• Floating point control
• Floating point branch
Branch Instruction Format: • Floating point operate
• Miscellaneous
• VAX compatibility
• Multimedia(graphics and video)
9 References
1. John L Hennessy, David A Patterson,
Fig 11: Specfp95 benchmark Computer Architecture: A Quantitative
Approach.
7.2 Weaknesses 2. Alpha Architecture Reference Manual,
• Very expensive. This is partly because of the Fourth Edition
huge on-chip L2 cache and partly because of 3. 21264/EV68A Microprocessor Hardware
DEC’s inability to mass-produce the Reference Manual.
processor like Intel. 4. Thomas Daniels, Dharmesh Parikh, Matt
• DEC’s first Alpha was a big deviation from Ziegler, The Alpha 21264 ISA.
their previous VAX architecture. Even 5. Artur Klauser, Trends in High-Performance
though DEC management assured its Microprocessor Design
customers of an upgrade path, it lost a huge 6. Zarka Cvetanovic and R.E. Kessler,
customer base because of this radical Performance Analysis of the Alpha 21264
change. based compaq ES40 system.
• Some essential instructions have been left 7. Linley Gwennap, Digital 21264 Sets New
out because it took more than a certain Standard, Microprocessor Forum, Vol 10,
number of clock cycles. Sometimes, the No. 14
series of simpler instructions took 2x as long 8. PALcode for Alpha Microprocessors,
to execute as the single complex instruction. System Design Guide, May 1996
• Larger binary code because of the extreme 9. Shubhendu S. Mukherjee, Peter Bannon,
RISC approach. Some results show that the Steven Lang, Aaron Spink, and David
series of simpler instructions are 50-100% Webb, The Alpha 21364 Network
larger than x86. Architecture
10. http://www.alphaprocessors.com
• The 64 bit architecture was way ahead of its
11. http://www3.sympatico.ca/n.rieck/links/dec_
time. This was at a time when there were no
memorial_site.html
64-bit compilers and other tools. This was
12. http://web.singnet.com.sg/~duane/b1000005
one of the reasons why the Alpha
.htm
architecture did not become very popular.
13. http://h18020.www1.hp.com/alphaserver/per
However, the OpenVMS system became
formance/spec2000.html
very popular later.
14. http://www.macspeedzone.com/archive/4.0/
• The line and way predictors add to the
WinvsMacSPECint.html
complexity of the processor. An additional
15. http://www.extremetech.com/article2/0,3973
cycle is added to the load latency. However, ,1158263,00.asp
the overall performance seems to improve
16. http://www.macinfo.de/hardware/chips-
because of the highly accurate prediction.
top.html (Translated from german)
• The Alpha processor has a very aggressive
speculation approach because of its highly
accurate predictors. The penalty of a
wrongly taken branch is quite high in some
cases.
• I do not have the current feature size, but
because of DEC’s lack of infrastructure, the
Alpha was using a 0.35µm feature size when
Intel and AMD were on 0.25µm feature size.