Processor Architecture Design Practices Survey Iss
Processor Architecture Design Practices Survey Iss
Abstract
The paper explores the recent architecture evaluations and related issues and compares NISC (No Instruction Set
Computer) features to those of CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set
Computer) processors. It has been observed that the complexities with embedded systems have increased manifold
and the design community has been searching a suitable method that can handle such complexities with dual aims of
(i) increased efficiency and (ii) reduction in time to introduce the product in the market that increase designer
productivity without sacrificing the design quality. The paper presents a review of different processors and compares
the variation in their utility and design
During late 1980s, the RISC became popular and the very concept was to eliminate the complex instructions and the
mPM. In RISC, all instructions are simple and they perform in one clock cycle allowing Datapath to be efficiently
pipelined in 4-8 pipelined stages. Here, the mPM was replaced with decoding stage that followed the instruction
fetch from PM. Given that instructions are simpler, a RISC wants approximately two instructions for each complex
instruction and, consequently, the size of the PM is doubled. Nevertheless, the Fetch-Decode-Execute-Store pipeline
of the whole processor improved the execution speed several times in comparison to its predecessor [14] [19] as
shown in figure 2. The RISC structural design involves an attempt to reduce execution time by simplifying the
instruction set of the computer. The major characteristics of a RISC processor could be (i) relatively few instructions,
(ii) relatively few addressing modes, (iii) memory access limited to load and store instructions, (iv) all operations
done within the registers of the CPU, (v) fixed length, easily decoded instruction format, (vi) single cycle instruction
execution, (vii) Hardwired rather than microprogrammed control, etc. as shown in figure 2 [3][11][12][13].
Finally, the concept of NISC offers an entirely new approach for design of custom processor and IPs as shown in
figure 3. It completely deletes the decode stage and stores the control word in the PM. In view of the fact that
control words are 2-3 times wider then instructions the PM increased in width by 2-3 times. Providentially, in this
approach each control word can execute 2-3 RISC instruction and thus NISC PM = RISC PM. Additionally, each
NISC is parametrizable and reconfigurable, which allows for very fine tuning to any application and performance. It
eradicates instructions to facilitate faster execution and better customization of process. NISC compiler, without
instructions, has full control of all the components and connections in the datapath which permits it to achieve better
resource utilization and one NISC toolset is adequate for all possible datapaths in the process [4],[10],[11].
Figure 3: NISC block diagram [10, 11]
The review paper has been designed with various sections: section 2 concentrates on the benefits and limitation of
processors; section 3 deals with the description of controller of processors; 4 focuses on the working of processors
and its’ inherent methodology; section 5 cares for the NISC processor and there purposes, and finally the section 6
summarizes the conclusion of the paper.
State Logic computes the next state to be loaded into the SR, while Output Logic produces the control signals and
the control outputs. Also the SR, Next-State and Output Logic can be redefined and reconfigured if the controller is
implemented on a FPGA [12] as shown in figure 4.
Figure 4: NISC Controller HW [12]
The Program Counter (PC) is basically State Register in programmable version whereas Output logic is
implemented by a Program Memory (PM). Here, the concept is that PM could be writable if we use a RAM or fixed
if we use a ROM. The next-state logic is replaced by an Address generator [12]. The main characteristic of the
programmable controller is that new program which could be loaded dynamically and performed as shown in figure
5.
4 process methodology
4.1 CISC Methodology
This block diagram is an example of Pentium II Processor with cache and memory interfaces. It explain how the P6
family micro – architecture implements Dynamic execution.
1. FETCH/DECODE stage - this unit takes as input the user program instruction stream from the instruction cache,
and decodes them into a series of μoperations (μops), and this μops represent the dataflow of that instruction stream.
DISPATCH/EXECUTE stage - an out-of-order unit that accepts the dataflow stream, later schedules execution of
the μoperations subject to data dependencies and resource availability and temporarily stores the results of these
speculative executions.
3. RETIRE stage - an in-order unit that knows how and when to retire the temporary, speculative results to
permanent architectural state.
BUS INTERFACE stage - a partially ordered unit responsible for connecting the three internal units to the real
world, and it directly communicates with the L2 cache supporting up to four concurrent cache accesses[16][17].
These two states do not affect the processor mode or the contents of the registers during transition between each
other. Switching from one state to another State e.g. Entering THUMB state can be achieved by executing a BX
instruction with the state Bit (bit 0) set in the operand register. Transition to THUMB state will also occur
automatically on return from an exception (IRQ, FIQ, UNDEF, ABORT, SWI etc.), if the exception was entered
with the processor in THUMB state.
Entry into ARM state (i) On execution of the BX instruction with the state bit clear in the operand register. (ii) On
the processor taking an exception (IRQ, FIQ, RESET, UNDEF, ABORT, SWI etc.). In this case, the PC is placed in
the exception mode’s link register, and execution commences at the exception’s vector address [8] [9].
set of register and bus transfers as well as the operations that are executed in one clock cycle. Here the compiler
produces this information by translating each operation of the application into a set of RTL and then setting up them
in order to meet the resource and timing constraints [4][10]
In the final implementation stage, the FSMD is converted to an FSM (Moor machine) in which each state represents
the control bit values in each clock cycle and the FSM is implemented in the controller of the NISC processor. Here,
as per the size and complexity of the FSM and the further design constraints, such as area and timing, the controller
can be implemented in two possible ways as below:
(i) For a simple FSMs, the controller can be synthesized using the standard cell libraries and a commercial synthesis
tool, such as Design Compiler, can do the synthesis from the description of the FSM;
(ii)For a complex FSMs, the controller could be applied using a memory and a program counter (PC) and the control
words are stored in the memory and selected by PC.
Further, the FSMD is also used to produce a cycle-accurate simulation model of the architecture and the simulator
gets the sequence of the control words and simulates them on the model of the target NISC processor. Given that
NISC does not have any instruction-set, there will be no functional simulator in the traditional sense of it and the
entire functionality of the application is validated by compiling and running the application itself or the equivalent 3-
address operations. The cycle-accurate simulator can be used for both validating the correctness of the timing and
functionality of the compiler’s output; and providing performance metrics such as speed and energy consumption for
the Model Generator. The performance results of the simulator could be analyzed to fine tune the structure of the
customized NISC processor. [4][10]The entire working of NISC is presented in figure 7. In general, the NISC
processor model plays the pivotal role in this methodology and its structure determines the flexibility of the analyzer
or designer for suggesting more optimized processors and it also influences the quality and complexity of the
simulator and compiler in the system.
Datapath can be extended or reduced by adding or omitting some components, and re-configurable, Datapath can be
reconnected with the same components [10] as shown in figure 8.
Figure8: The NISC Processor [1,6,11])
The figure 8 shown above is an illustration of a generic NISC architecture. A NISC architecture might have (i)
Control Pipelining i.e. CW and Status register, (ii) Datapath Pipelining i.e. pipelined components or registers at
input/output of components, and (iii) Data Forwarding i.e. the dotted connection lines from output of some
components to input of some others. Here, the control word register (CW) controls both the datapath and the address
generator (AG) of the controller, and the datapath section of CW contains the control values of all datapath
components as well as a small constant field. [1][2][10] At the same time, the controller section of CW determines
how the next PC address is calculated, and it provides a condition, a jump type either (i) direct or (ii) indirect, and an
offset to the AG. For indirect jumps, AG calculated the target address by adding the offset and the current value of
PC, while for direct jumps; AG uses the value on its address port as target address. If the condition in CW and the
status input of the AG are equal, then the calculated target address is loaded into PC otherwise, in other words it is
incremented. Further, In NISC processor, there is a link register (LR) in the controller which stores the return
address of a function call. Here, the return address is usually the value of current PC plus one. Additionally to
standard components, the datapath could have pipelined and multi-cycle components as shown in figure 8, ALU,
MUL and Mem are single-cycle, pipelined and multi-cycle components, respectively. There is no limitation on the
connections of components in the datapath. If the input of a component comes from multiple sources, a Bus or a
Multiplexer is used to select the actual input. The buses are explicitly modeled and we assume one control bit per
each writer to the bus. The multiplexers are implicit and we assume log²n control bits for n writers [6] [7] [10].
6. Conclusion
The review paper explores the recent designing evaluations and related issues and compares the features of NISC
(No Instruction Set Computer), CISC (Complex Instruction Set Computer), and RISC (Reduced Instruction Set
Computer) processors as each processor has its own merits and demerits. In CISC, complex functionalities could be
executed with complex instructions at the same time and the program memory (PM) contains complex instructions
that are mapped to a sequence of micro-codes pre-stored in a microcode memory (mPM). The datapath could
contain no or little decoding with microcodes. One of the CISC examples is Motorola 68000 where microcode
instructions are converted to a sequence of nanocode commands. In CISC machines the control controls the datapath
resources in all cycles. In RISC, there is no need of microcode memory because its instructions are stored in the
program memory and are decoded as they are applied to datapath. Further to these discussions we can conclude that
CISC and RISC are relatively easier to design and compile, but they have larger code sizes. Finally, In NISC there is
no instruction set; its architecture description is very simple and concise. In NISC, the nanocodes are directly
generated and are compressed and every features of the datapath can be efficiently utilized by NISC compiler. If we
think about the difference of RISC and the NISC, the instruction decode stage of RISC is replaced by the
decompression stage in NISC. The decompression stage of NISC is automatically generated without any need for
manual specification. In other words, NISC offers the fastest execution of any computer program and has induced
efficiency and reduced the time to put the product in the market.
References
[1]. B.Gorjiara, M.Reshadi and D. Gajski, Merged Dictionary Code Compression for FPGA Implementation of Custom Microcoded PEs, ACM
Transactions on Reconfigurable Technology and Systems, 2008.
[2]. B. Gorjiara and D. Gajski, Automatic Architecture Refinement Techniques for Customizing Processing Elements, Design Automation
Conference (DAC), June 2008.
[3]. M. Reshadi, B. Gorjara and D. Gajski, C-Based Design Flow: A Case Study on G.729A for Voice over Internet Protocol, Design Automation
Conference (DAC), pp. 72-75, May 2008.
[4]. NISC Technology website: http://www.cecs.uci.edu/~nisc/