A Brief History of Micro Programming
A Brief History of Micro Programming
The central processing unit in a computer system is composed of a data path and a control
unit. The data path includes registers, function units such as ALUs (arithmetic and logic units)
and shifters, interface units for main memory and/or I/O busses, and internal processor busses.
The control unit governs the series of steps taken by the data path during the execution of a
user-visible instruction, or macroinstruction (e.g., load, add, store).
Each step in the execution of a macroinstruction is a transfer of information within the data
path, possibly including the transformation of data, address, or instruction bits by the function
units. A step is thus called a register transfer and is accomplished by gating out (sending)
register contents onto internal processor busses, selecting the operation of ALUs, shifters, etc.,
through which that information passes, and gating in (receiving) new values for one or more
registers. Control signals consist of enabling signals to gates that control the sending or
receiving of data at the registers (called control points) and operation selection signals. The
control signals identify the microoperations required for each register transfer, and these
signals are supplied by the control unit. A complete macroinstruction is executed by
generating an appropriately timed sequence of groups of control signals (microoperations).
As an example, consider the simple processing unit in Figure 1. This datapath supports an
accumulator-based macroinstruction set of four instructions (load, add, store, and conditional
branch). The accumulator (ACC) and program counter (PC) are visible to the
macroinstruction-level programmer, but the other registers are not.
1
2
Figure 1. Simple data path for a four-instruction computer (the small circles are control
points)
The definitions of the macroinstructions are given in Figure 2, and the definitions of the
control signals are given in Figure 3.
The first group of control signals needed to start an instruction fetch on this datapath would
gate the contents of the program counter out over the internal bus and into the memory
2
3
address register. Either in this group (i.e., during this first time step) or the next, the program
counter would be incremented and the memory interface sent a signal indicating a memory
read. The following time steps would perform the microoperations required to fetch operands,
execute the function specified by the instruction opcode, and store any results. A simple
instruction might require five to ten time steps on this datapath and involve a dozen or more
control signals.
The control unit is responsible for generating the control sequences. As shown in Figure 1, the
control unit inputs consist of (1) a mutually exclusive set of time step signals, (2) a mutually
exclusive set of decoded opcode signals, and (3) condition signals used for implementing the
conditional branch instruction.
The logic expression for an individual control signal can be written in the sum-of-products
form in which the terms consist of a given time step anded with an opcode signal identifying a
specific instruction opcode. The control signal will then be asserted when needed at one or
more specific time steps during the instruction fetch and execution. The logic expressions for
the control signals for the simple example are given in Figure 5.
When the control signal logic expressions are directly implemented with logic gates or in a
programmed logic array (PLA), the control unit is said to be hardwired. Alternatively, in a
microprogrammed control unit, the control signals that are to be generated at a given time step
are stored together in a control word, which is called a microinstruction. The collection of
control words that implement an instruction is called a microprogram, and the microprograms
are stored in a memory element called the control store.
4
5
Figure 6. Control store for the four-instruction computer (control bits of zero not shown)
The first fourteen bits in each microinstruction are the control signals used in the Figure 1
datapath. For this example, six additional bits are used in the microinstruction to implement
sequencing. The approach depicted uses completely unencoded storage of control signals.
However, this is inefficient (note all the zero locations), and an actual implementation would
likely encode groups of control signals. For example, all the signals used to gate the contents
of registers out onto the single shared bus are mutually exclusive and thus could be encoded
into one field.
The sequencing part of the microinstruction in Figure 6 includes a four-bit next address field
along with one bit ("branch-via-table") to indicate that the next address should be taken from
a separate decoding table. This table is indexed by the two-bit opcode field from the
instruction register and provides an entry-point address in the control store for the
microprogram to execute a particular instruction. Another bit ("or-address-with-acceq") is
used to handle conditional branching in the following manner. A condition signal indicating
that the accumulator is equal zero ("acceq0") is provided by the datapath, and this final bit in
the microinstruction allows the condition signal to affect the least significant bit in the next
5
6
address field. Thus the next microinstruction will be taken from one of two locations,
depending on the value of the condition signal.
Since the microprograms stand between the logic design (hardware) and the macroinstruction
program being executed (software), they are sometimes referred to as firmware. This term is
also subject to loose usage. Ascher Opler first defined it in a 1967 Datamation article as the
contents of a writable control store, which could be reloaded as necessary to specialize the
user interface of a computer to a particular programming language or application [1].
However, in later general usage, the term came to signify any type of microcode, whether
resident in read-only or writable control store. Most recently, the term has been widened to
denote anything ROM-resident, including macroinstruction-level routines for BIOS, bootstrap
loaders, or specialized applications.
See chapters 4 and 5 of Hamacher, Vranesic, and Zaky [2] and chapter 5 of Patterson and
Hennessy [3] for overviews of datapath design, control signals, hardwiring, and
microprogramming. Older texts devoted exclusively to microprogramming issues include
Agrawala and Rauscher [4], Andrews [5], Habib [6], and Husson [7]. The ACM and IEEE
have sponsored for almost forty years an Annual Workshop on Microprogramming and
published the proceedings; more recently the conference name has changed to the
International Symposium on Microarchitecture, to reflect the change of emphasis from just
microprogramming to the broader field of microarchitecture, i.e., the internal design of the
processor, including the areas of pipelining, branch prediction, and multiple instruction
execution.
History of Microprogramming
In the late 1940's Maurice Wilkes of Cambridge University started work on a stored-program
computer called the EDSAC (Electronic Delay Storage Automatic Calculator). During this
effort, Wilkes recognized that the sequencing of control signals within the computer was
similar to the sequencing actions required in a regular program and that he could use a stored
program to represent the sequences of control signals. In 1951, he published the first paper on
this technique, which he called microprogramming [8].
In an expanded paper published in 1953, Wilkes and his colleague Stringer further described
the technique [9]. In this visionary paper, they considered issues of design complexity, test
and verification of the control logic, alternation of and later additions to the control logic,
support of different macroinstruction sets (by using different matrices), exploiting parallelism
within the data path, multiway branches, environment substitution, microsubroutines,
6
7
variable-length and polyphase timing, and pipelining the access to the control store. The
Cambridge University group went on to implement the first microprogrammed computer, the
EDSAC 2, in 1957 using an 8-by-6 magnetic core matrix.
Due to the difficulty of manufacturing fast control stores in the 1950's, microprogramming did
not immediately become a mainstream technology. However, John Fairclough at IBM's
laboratory in Hursley, England, led a development effort in the late 1950's that explored a
read-only magnetic core matrix for the control unit of a small computer. In 1961, Fairclough's
experience played a key role in IBM's decision to pursue a full range of compatible
computers, which was announced in 1964 as the System/360 [10]. All but two of the initial
360 models (the high-end Models 75 and 95) were microprogrammed [11].
Because of the vital nature of microprogramming to the plan for compatibility, IBM
aggressively pursued three different types of read-only control store technologies [10]. The
first was Balanced Capacitor Read-Only Storage (BCROS), which used two capacitors per bit
position in a storage of 2816 words of 100 bits each. A second technology was Transformer
Read-Only Storage (TROS), which used the magnetic core approach taken by Fairclough at
Hursley in a storage of 8192 words of 54 bits each.
A third technology was developed for the Model 30, Card Capacitor Read-Only Storage
(CCROS). This technology was based on mylar cards the size and shape of standard punch
cards that encased copper tabs and access lines. A standard card punch could be used to
remove some of the tabs from a 12 by 60 bit area in the middle of the card. Removed tabs
were read as zeros by sense lines, while the remaining tabs were read as ones. An assembly of
42 boards with 8 cards per board was built into the Model 30 cabinet.
CCROS cards on the Model 30 could be replaced in the field. This made design modifications
easier to accomplish, as well as providing for field diagnosis. The control store of the Model
30 was divided into a section for resident routines and a section of six CCROS cards that
would hold test-specific routines [14]. The reserved board could be "populated" with one of
several sets of CCROS cards by an engineer running machine diagnostics.
7
8
The later System/360 Model 25 and models of the later System/370 series were configured
with at least one portion of the control store being read-write for loading microcode patches
and microdiagnostics. (On the smaller System/370 models, the control store was allocated in
part of main memory). Indeed, IBM invented the eight-inch floppy diskette and its drive to
improve the distribution of microcode patches and diagnostics; the diskette was first used in
1971 on the System/370 Model 145, which had a writable control store (WCS) of up to 16K
words of 32 bits each.
During the change-over of the IBM product line to the new System/360 in the mid-sixties, the
large customer investment in legacy software, especially at the assembly language level, was
not fully recognized [15]. Software conversion efforts through recompilation of high-level-
language programs were planned, but the pervasive use of machine-dependent codes and the
unwillingness of customers (and economic infeasibility) to convert all these machine-
dependent codes required efforts at providing simulators of the older computers. Initial studies
of the simulators, however, indicated that, at best, performance would suffer by factors
ranging from ten to two [10]. Competitors were meanwhile seeking to attract IBM customers
by providing less intensive conversion efforts using more compatible hardware and automatic
conversion tools, like the Honeywell "Liberator" program that accepted IBM 1401 programs
and converted them into programs for the 1401-like Honeywell H-200.
IBM was spared mass defection of former customers when engineers on the System/360
Model 30 suggested using an extra control store that could be selected by a manual switch and
would allow the Model 30 to execute IBM 1401 instructions [16]. The simulator and control
store ideas were then combined in a study of casting crucial parts of the simulator programs as
microprograms in the control store. Stuart Tucker and Larry Moss led the effort to develop a
combination of hardware, software, and microprograms to execute legacy software for not
only the IBM 1401 computers but also for the IBM 7000 series [17]. Moss felt their work
went beyond mere imitation and equaled or excelled the original in performance; thus, he
termed their work as emulation [10]. The emulators they designed worked well enough so that
many customers never converted legacy software and instead ran it for many years on
System/360 hardware using emulation.
Because of the success of the IBM System/360 product line, by the late 1960's
microprogramming became the implementation technique of choice for most computers
except the very fastest and the very simplest. This situation lasted for about two decades. For
example, all models of the IBM System/370 aside from the Model 195 and all models of the
DEC PDP-11 aside from the PDP-11/20 were microprogrammed.
At perhaps the peak of microprogramming's popularity, the DEC VAX 11/780 was delivered
in 1978 with a 4K word read-only control store of 96 bits per word and an additional 1K
writable region available for diagnostics and microcode patches. An extra-cost option on the
11/780 was 1K of user-writable control store.
Several early microprocessors were hardwired, but some amount of microprogramming soon
became a common control unit design feature. For example, among the major eight-bit
microprocessors produced in the 1974 to 1976 time frame, the MC6800 was hardwired while
the Intel 8080 and Zilog Z80 were microprogrammed [18]. An interesting comparison
between 1978-era 16-bit microprocessors is the hardwired Z8000 [19] and the microcoded
Intel 8086 [20]. The 8086 used a control store of 504 entries, each containing a rather generic
21-bit microinstruction. Extra decoding logic was used to tailor the microinstruction to the
particular byte-wide or word-wide operation. In 1978 the microprogramming of the Motorola
8
9
68000 was described [18,21,22]. This design contained a sophisticated two-level scheme.
Each 17-bit microinstruction could contain either a 10-bit microinstruction jump address or a
9-bit "nanoinstruction" address. The nanoinstructions were separately stored 68-bit words, and
they identified the microoperations active in a given clock cycle. Additional decoding logic
was used along with the nanoinstruction contents to drive the 196 control signals.
An article by Nick Tredennick in 1982 characterized the development of four major uses of
microprogramming, which he termed cultures [23]:
An important sub-area within the microprogrammable machine culture was the desire
to produce a universal host machine with a general data path for efficient emulation of
any macroinstruction set architecture or for various high-level-language-directed
machine interfaces. Examples of this include the Nanodata QM-1 (ca. 1970) [4,28,29]
and the Burroughs B1700 (ca. 1972) [28,30,31,32]. ALU operations on the QM-1
were selectable as 18-bit or 16-bit, binary or decimal, and as unsigned or two's
complement or one's complement. However, even with efforts to provide a generalized
set of operations such as this, the unavoidable mismatches between host and target
9
10
machine data types and addressing structures never allowed unqualified success for
the universal host concept.
The B1700 is probably the closest commercial machine to Opler's original vision of
firmware. Multiple sets of microprograms could be held in its memory, each one
presenting a different high-level macroinstruction interface. The B1700 operating
system was written in a language called SDL, while application programs were
typically written in Fortran, Cobol, and RPG. Thus, the control store for the B1700
could hold microprograms for an SDL-directed interface as well as microprograms for
a Fortran-directed interface and a Cobol/RPG-directed interface. On an interrupt, the
B1700 would use the SDL-directed microprograms for execution of operating system
code, and then on process switch the operating system could switch the system to the
Fortran-directed microprograms or the Cobol/RPG-directed microprograms. Baron
and Higbie question the B1700's choice of 24-bit word size, lack of two's complement
arithmetic support, address space limitations, and limited interrupt and I/O facilities,
but they mainly attribute the limited commercial success of the B1700 to the lack of
operating system documentation for developers [31].
Starting in the late seventies a trend emerged in the growth of complexity in macroinstruction
sets. The VAX-11/780 superminicomputer and the MC68020 microprocessor are examples of
this trend, which has been labeled CISC for complex instruction set computer. The trend
toward increasing complexity is generally attributed to the success of microprogramming to
that date. Freed from previous constraints arising from implementation issues, the VAX
designers emphasized ease of compilation [27]. For example, one design criterion was the
generality of operand specification. However, the result was that the VAX had complex,
variable-length instructions that made high-performance implementations difficult [33].
Compared to the original MC68000 (ca. 1979), the MC68020 (ca. 1984) added virtual
memory support, unaligned data access, additional stack pointers and status register bits, six
additional stack frame formats, approximately two dozen new instructions, two new
addressing modes (for a total of 14), and 18 new formats for offsets within addressing modes
(for a total of 25). To support this, microcode storage increased from 36 KB to 85 KB [22].
The complex memory indirect addressing modes on the MC68020 are representative of the
design style engendered by a control store with access time only a fraction of the main
memory access time. This style dictates that only a relatively few complex macroinstructions
should be fetched from main memory and a multitude of register transfers can then be
generated by these few macroinstructions.
Added to the trend of growing complexity within "normal" macroinstruction set architectures
was an emphatic call to "bridge the semantic gap", under the assumption that higher-level
machine interfaces would simplify software construction and compilation. See chapter 1 of
Myers [30] for an overview of this argument.
However, starting in the 1980's, there was a reaction to the trend of growing complexity.
Several technological developments drove this reaction. One development was that VLSI
technology was making on-chip or near-chip cache memories attractive and cost effective;
indeed, dual 256-byte instruction and data caches were included on the MC68030 (ca. 1987),
and dual 4K-byte instruction and data caches were part of the MC68040 (ca. 1989). Thus
effective memory access time was now closer to the processor clock cycle time.
10
11
A second development was the ability of VLSI designers to avoid the haphazard layouts and
chip area requirements of random logic. Instead, designers could commit sum-of-products
logic expressions to a programmed logic array (PLA). This structure consists of an and-gate
array followed by an or-gate array, and it provides a straightforward implementation of sum-
of-products expressions for control signals.
The change in the logic/memory technological ratio and the availability of a compact
implementation for a hardwired control unit set the stage for a design style change to RISC,
standing for reduced instruction set computer and named so as to heighten the contrast with
CISC designs. The RISC design philosophy argues for simplified instruction sets for ease of
hardwired, high-performance, pipelined implementations. In fact, one could view the RISC
instruction sets as quite similar to highly-encoded microinstructions and the accompanying
instruction cache as a replacement for a writable control store. Thus, instead of using a fixed
set of microprograms for interpretation of a higher-level instruction set, RISC could be
viewed as compiling directly to microcode. A counterargument to this view, given by some
RISC proponents, is that RISC has less to do with exposing the microprogram level to the
compiler and more to do with (1) a rediscovery of Seymour Cray's supercomputer design style
as exhibited in the CDC 6600 design of the mid-sixties, and (2) an agreement with John
Cocke's set of hardware/software tradeoffs as exhibited in the IBM 801 design of the early
eighties.
The 1980's proved to be a crucial turning point for traditional microprogramming. Without
exception, modern-day RISC microprocessors are hardwired. The MC68000 macroinstruction
set has also downsized, so that an implementation of the remaining core macroinstruction set
can be hardwired [34]. Even recent processors for IBM System/390 mainframes are hardwired
[35]. Application specific tailoring of hardware now no longer depends on microprogramming
but is typically approached as an ASIC or FPGA design problem.
Microcode is still used, however, in the numerous Intel x86 compatible microprocessors like
the Pentium 4 and AMD Athlon. However, within these designs, simple macroinstructions
have one to four microinstructions (called uops or Rops, respectively) immediately generated
by decoders without control store fetches. This suffices for all but the most complex
macroinstructions, which still require a stream of microinstructions to be fetched from an on-
chip ROM by a microcode sequencer. See Shriver and Smith [36] for a detailed exposition of
the internal design of an x86-compatible microprocessor, the AMD K6-2.
Flavors of Microprogramming
The level of abstraction of a microprogram can vary according to the amount of control signal
encoding and the amount of explicit parallelism found in the microinstruction format. On one
end of a spectrum, a vertical microinstruction is highly encoded and looks like a simple
macroinstruction; it might contain a single opcode field and one or two operand specifiers.
For example, see Figure 7, which shows an example of vertical microcode for a Microdata
machine [37].
11
12
Each vertical microinstruction specifies a single datapath operation and, when decoded,
activates multiple control signals. Branches within vertical microprograms are typically
handled as separate microinstructions using a "branch" or "jump" opcode. This style of
microprogramming is the most natural to someone experienced in regular assembly language
programming, and is similar to programming in a RISC instruction set.
Branching in horizontal microprograms is also more complicated than in the vertical case;
each horizontal microinstruction is likely to have at least one branch condition and associated
target address and perhaps includes an explicit next-address field. The programming effect is
more like developing a directed graph (with various cycles in the graph) of groups of control
signals.
The first field (called K and denoted in Figure 8 by the four periods) holds a 10-bit branch
address ("FETCH" in the example in Figure 8), various condition select subfields, and some
control subfields. The remaining four fields (called T1-T4 and appearing on separate lines in
the figure) specify the particular microoperations to be performed. Each T field contains 41
subfields.
A nanoinstruction was executed on the QM-1 in four phases: the K field was continuously
active, and the T fields were executed in sequence, one per phase (note the staggered S and X
characters in Figure 8 for the T1-T4 fields). This approach allows one 360-bit nanoinstruction
to specify the equivalent of four 144-bit nanoinstructions (i.e., the K field appended with each
particular T field in turn). Sequencing subfields within the T fields provide for repetitive
execution of the same nanoinstruction until certain conditions become true, and other
subfields provide for the conditional skipping of the next T field.
• bit steering, in which one field's value determines the interpretation of other field(s) in
the microinstruction;
• environment substitution, in which fields from user-level registers are used as counts
or operand specifiers for the current microinstruction;
• residual control, in which one microinstruction deposits control information in a
special setup or configuration register that governs the actions and interpretation of
subsequent microinstructions;
• polyphase specification, in which different fields in a microinstruction are active in
different clock phases or clock cycles (e.g., Nanodata QM-1 as described above); and,
• multiphase specification, in which the time duration (the number or length of clock
phases or cycles) for an ALU activity would be explicitly lengthened to accommodate
the time required (e.g., such as lengthening the time to account for carry propagation
during an addition, also a feature of the QM-1).
Subroutine calls may also be available at the microprogram level, however, the nesting level
is typically restricted by the size of a dedicated control store return address stack.
For use in decoding macroinstructions, the initial mapping to a particular microprogram can
be accomplished by one of these methods:
13
14
A major design question for any HLML is whether (and, if so, how) to express parallelism and
timing constraints. On one hand, a language like Dasgupta's S* (ca. 1978) uses a Pascal-like
syntax but includes two timing control structures: cocycle and coend, which identify parallel
operations active in the same clock cycle; and, stcycle and stend, which identify parallel
operations that start together in a new clock cycle (but do not necessarily have the same
duration). On the other hand, Preston Gurd wrote a microcode compiler using a subset of the
C programming language as source statements (Micro-C, ca. 1983) for his masters thesis at
University of Waterloo. Gurd found that microprograms written without explicit constraints
were easier to understand and debug. In fact, debugging could take place on any system
having a C compiler and make use of standard debuggers. Moreover, providing a microcode
compiler for an existing HLL allows preexisting codes to be converted to microcode without
further programming effort.
14
15
References
[2] V.C. Hamacher, Z.G. Vranesic, and S.G. Zaky. Computer Organization (3rd ed.). New
York: McGraw-Hill, 1990.
[3] D.A. Patterson and J.L. Hennessy. Computer Organization and Design: The Hardware /
Software Interface (2nd ed.). San Mateo, CA: Morgan Kaufmann, 1998.
[4] A.K. Agrawala and T.G. Rauscher. Foundations of Microprogramming. New York:
Academic Press, 1976.
[6] S. Habib (ed.), Microprogramming and Firmware Engineering Methods. New York: van
Nostrand, 1988.
[7] S.H. Husson, Microprogramming: Principles and Practice. Englewood Cliffs, NJ: Prentice
Hall, 1970.
[8] M.V. Wilkes, "The Best Way to Design an Automated Calculating Machine," Manchester
University Computer Inaugural Conf., 1951, pp. 16-18.
- Reprinted in: MV Wilkes, "The Genesis of Microprogramming," IEEE Annals of the History
of Computing, v. 8, n. 3, 1986, pp. 116-126.
15
16
[9] M.V. Wilkes and J.B. Stringer, "Microprogramming and the Design of the Control Circuits
in an Electronic Digital Computer," Proceedings of the Cambridge Philosophical Society, v.
49, 1953, pp. 230-238.
- Reprinted as chapter 11 in: D.P. Siewiorek, C.G. Bell, and A. Newell. Computer Structures:
Principles and Examples. New York: McGraw-Hill, 1982.
- Also reprinted in: M.V. Wilkes, "The Genesis of Microprogramming," IEEE Annals of the
History of Computing, v. 8, n. 3, 1986, pp. 116-126.
[10] E.W. Pugh, L.R. Johnson, and J.H. Palmer. IBM's 360 and Early 370 Systems.
Cambridge, MA: MIT Press, 1991.
[11] S.G. Tucker, "Microprogram Control for System/360," IBM Systems Journal, v. 6, n. 4,
1967, pp. 222-241.
[12] A. Padegs, "System/360 and Beyond," IBM Journal of Research and Development, v. 25,
n. 5, 1981, pp. 377-390.
[13] M. Phister, Jr. Data Processing Technology and Economics (2nd ed.). Bedford, MA:
Digital Press, 1979.
[14] A.M. Johnson, "The Microdiagnostics for the IBM System 360 Model 30," IEEE
Transactions on Computers. v. C-20, n. 7, 1971, pp. 798-803.
[16] M.A. McCormack, T.T. Schansman, and K.K. Womack, "1401 Compatibility Feature on
the IBM System/360 Model 30," Communications of the ACM, v. 8, n. 12, 1965, pp. 773-776.
[17] S.G. Tucker, "Emulation of Large Systems," Communications of the ACM, v. 8, n. 12,
1965, pp. 753-761.
[19] M. Shima, "Demystifying Microprocessor Design," IEEE Spectrum, v. 16, n. 7, 1979, pp.
22-30.
[20] J. McKevitt and J. Bayliss, "New Options from Big Chips," IEEE Spectrum, v. 16, n. 3,
1979, pp. 28-34.
16
17
[25] J. Mick and J. Brick. Bit-Slice Microprocessor Design. New York: McGraw-Hill, 1980.
[26] J.R. Larus, "A Comparison of Microcode, Assembly Code, and High-Level Languages
on the VAX-11 and RISC I," ACM Computer Architecture News, v. 10, n. 5, 1982, pp. 10-15.
[27] C.G. Bell, J.C. Mudge, and J.E. McNamara. Computer Engineering: A DEC View of
Hardware Systems Design. Bedford, MA: Digital Press, 1978.
[28] A.B. Salisbury. Microprogrammable Computer Architectures. New York: Elsevier, 1976.
[30] G.J. Myers. Advances in Computer Architecture. 2nd ed. New York: Wiley, 1978.
[31] R.J. Baron and L. Higbie. Computer Architecture: Case Studies. Reading, MA: Addison-
Wesley, 1992.
[33] D. Bhandarkar and D.W. Clark, "Performance from Architecture: Comparing a RISC and
a CISC with Similar Hardware Organization," Proceeding of the 4th International Conference
on Architectural Support for Programming Languages and Operating Systems [ASPLOS],
1991, pp. 310-319.
[35] C.F. Webb and J.S. Liptay, "A High-Frequency Custom CMOS S/390 Microprocessor,"
IBM Journal of Research and Development, v. 41, n. 4/5, 1997, pp. 463-473.
[40] B.R. Rau and J. Fisher, "Instruction-Level Parallel Processing: History, Overview, and
Perspective," Journal of Supercomputing, v. 7, 1993, pp. 9-50.
17