0% found this document useful (0 votes)
67 views

Processor Architecture Design Practices Survey Iss

The document summarizes different processor architectures - CISC, RISC, and NISC. It discusses their evolution over time, from CISC in the 1970s, to RISC in the 1980s, to the more recent NISC approach. The key aspects of each architecture are outlined, including their benefits and limitations. For example, CISC uses complex instructions but requires microcode translation, while RISC uses simpler instructions that can execute in a single cycle. NISC removes instructions altogether and stores control words directly in program memory, allowing for very customized and efficient designs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Processor Architecture Design Practices Survey Iss

The document summarizes different processor architectures - CISC, RISC, and NISC. It discusses their evolution over time, from CISC in the 1970s, to RISC in the 1980s, to the more recent NISC approach. The key aspects of each architecture are outlined, including their benefits and limitations. For example, CISC uses complex instructions but requires microcode translation, while RISC uses simpler instructions that can execute in a single cycle. NISC removes instructions altogether and stores control words directly in program memory, allowing for very customized and efficient designs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Sadhna.K. Mishra et. al.

/ International Journal of Engineering Science and Technology


Vol. 2(6), 2010, 1729-1736

Processor Architecture Design Practices:


survey & Issues
(SADHNA K. MISHRA, ARVIND RAJAWAT, R.P. SINGH)

Abstract

The paper explores the recent architecture evaluations and related issues and compares NISC (No Instruction Set
Computer) features to those of CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set
Computer) processors. It has been observed that the complexities with embedded systems have increased manifold
and the design community has been searching a suitable method that can handle such complexities with dual aims of
(i) increased efficiency and (ii) reduction in time to introduce the product in the market that increase designer
productivity without sacrificing the design quality. The paper presents a review of different processors and compares
the variation in their utility and design

Keywords—CISC, NISC, RISC, ASIP, HLS, Micro Program Memory (mPM).


1. Introduction
In the recent years, with increased complexities of embedded systems, those involved in designing have been
searching for a new alternative approach that can handle complexities and subsequently could lead to dual target of
(i) increased productivity and (ii) reduced time to introduce the product in the market. One of the probable solutions
that come to mind is increasing levels of abstraction, or in other words, increasing the size of the basic building
blocks. On the other hand, it is not clear how many of these building blocks we need and what these basic blocks
should be and apparently, the necessary building blocks are processors and memories as shown in figure 1
In fact, the progression of the processor architecture could be broadly divided into three historical phases. During
1970s, it was CISC which was a popular choice. Given that the Program Memory (PM) was sluggish, the designers
tried to improve performance by constructing complex instructions. Here, each complex instruction took several
clock cycles; with datapath control words for each clock cycle were stored in a much faster Micro Program Memory
(mPM) [3][11][12][13].

Figure l: Layout of CISC, RISC and NISC [4, 10]

During late 1980s, the RISC became popular and the very concept was to eliminate the complex instructions and the
mPM. In RISC, all instructions are simple and they perform in one clock cycle allowing Datapath to be efficiently
pipelined in 4-8 pipelined stages. Here, the mPM was replaced with decoding stage that followed the instruction
fetch from PM. Given that instructions are simpler, a RISC wants approximately two instructions for each complex

ISSN: 0975-5462 1729


Sadhna.K. Mishra et. al. / International Journal of Engineering Science and Technology
Vol. 2(6), 2010, 1729-1736

instruction and, consequently, the size of the PM is doubled. Nevertheless, the Fetch-Decode-Execute-Store pipeline
of the whole processor improved the execution speed several times in comparison to its predecessor [14] [19] as
shown in figure 2. The RISC structural design involves an attempt to reduce execution time by simplifying the
instruction set of the computer. The major characteristics of a RISC processor could be (i) relatively few instructions,
(ii) relatively few addressing modes, (iii) memory access limited to load and store instructions, (iv) all operations
done within the registers of the CPU, (v) fixed length, easily decoded instruction format, (vi) single cycle instruction
execution, (vii) Hardwired rather than microprogrammed control, etc. as shown in figure 2 [3][11][12][13].

Figure 2: RISC and CISC block diagram [10, 11]

Finally, the concept of NISC offers an entirely new approach for design of custom processor and IPs as shown in
figure 3. It completely deletes the decode stage and stores the control word in the PM. In view of the fact that
control words are 2-3 times wider then instructions the PM increased in width by 2-3 times. Providentially, in this
approach each control word can execute 2-3 RISC instruction and thus NISC PM = RISC PM. Additionally, each
NISC is parametrizable and reconfigurable, which allows for very fine tuning to any application and performance. It
eradicates instructions to facilitate faster execution and better customization of process. NISC compiler, without
instructions, has full control of all the components and connections in the datapath which permits it to achieve better
resource utilization and one NISC toolset is adequate for all possible datapaths in the process [4],[10],[11].
Figure 3: NISC block diagram [10, 11]

The review paper has been designed with various sections: section 2 concentrates on the benefits and limitation of
processors; section 3 deals with the description of controller of processors; 4 focuses on the working of processors
and its’ inherent methodology; section 5 cares for the NISC processor and there purposes, and finally the section 6
summarizes the conclusion of the paper.

ISSN: 0975-5462 1730


Sadhna.K. Mishra et. al. / International Journal of Engineering Science and Technology
Vol. 2(6), 2010, 1729-1736

2. The benefits and limitations


2.1.1 The benefits of CISC
Precisely, the benefits of CISC processors are (i) emphasis on hardware, (ii) includes multi-clock complex
instructions, (iii) memory-to-memory: "LOAD" and "STORE" incorporated in instructions, (iv) small code sizes,
high cycles per second, (v) transistors used for storing complex instructions [12][14].
2.1.2 The limitations of CISC
There are some limitations such as: (i) for the incorporation of older instruction sets into new generations of
processors tended to force growing complexity, (ii) many specialized CISC instructions were not used frequently
enough to justify their existence, (iii) because each CISC command must be translated by the processor into tens or
even hundreds of lines of microcode, it tends to run slower than an equivalent series of simpler commands that do
not require so much translation [12][13][14].
2.2.1 The benefits of RISC
Major benefit of RISC processor are listed as: (i) relatively few instructions, (ii) relatively few addressing modes,
(iii) memory access limited to load and store instructions, (iv) all operations done within the registers of the CPU,
(v) fixed- length, easily decoded instruction format, (vi) single-cycle instruction execution, (vii) hardwired rather
than microprogrammed control [4][14][29].
2.2.2 The limitations of RISC
Limitations of RISC processor are: (i) few instructions are required (ii) decoding logic required (iii) hardwired
control unit does not dynamically change [4][14][21].
2.3.1 The benefits of NISC
The benefits of NISC could be listed as: (i) it allows the distinction between SW and HW implementation to
disappear; for HW implementation the control words are in ROM or gate logic, whereas for SW implementation
they are in a RAM, (ii) fastest possible execution since DataPath can be pipelined by introducing any number of
stages and since DataPath can have any level of parallelism, (iii) as there is no instruction set, NISC reduces the last
stage of interpretation between C code and HW; here the C code runs directly on HW i.e. DataPath, (iv) NISC can
emulate any instruction set, given that NISC control word can execute any operation as long as DataPath resources
in DataPath are available, thus any legacy code can be executed on a properly defined NISC by converting legacy
instructions into NISC control words through a table call on, (v) the NISC compiler uses the High-Level Synthesis
algorithms for covering parse tree with control words, (vi) as NISC is a sufficient component for any computation,
only one compiler is needed world wide, (vii) likewise, only one NISC processor, even though in different versions
and with different parameters, is needed world wide and thus such uniqueness will simplify education, design, trade,
maintenance, testing and many other aspects of system design, in similar fashion as gate libraries led to
standardization of digital design previously [4][11][12].
2.3.2 The limitations of NISC
Precisely, there are number of limitations such as (i) NISC does not support Standard libraries of C, (ii) application
programming is in ANSI C, (iii) few limitations on the input C program and function pointers and global pointer
initializations are not currently supported in NISC, (iv) for maintenance, testing and many other aspects of system
design, in similar fashion as gate libraries led to standardization of digital design previously [4][11][12].
3. Implementation HW and SW Controller
3.1 CISC Controller
The CISC controller is based on microprogrammed controllers because the instruction set of CISC is complex
hence hardwired controllers become too complicated. All microroutines corresponding to the machine instructions
are stored in the control store. The control unit generates the sequence of control signals for control store the CWs of
the microroutine corresponding to the respective instruction.
On other word the control unit is implemented just like another very simple CPU, inside the CPU, executing
microroutines stored in the control store. [5][14]
3.2 RISC Controller
The main concept behind RISC controller is Hardwired controllers. It is faster then microprogrammed controller. In
this case, the control unit is a combinatorial circuit. It gets a set of inputs from IR, flags, clock, system bus and
transforms them into a set of control signals. [5][14]
3.3 NISC Controller
3.3.1 NISC Controller HW (Fixed implementation)
In brief, the main concept behind NISC controller is that it produces a sequence of control words in order to execute
computation specified by the C program. And the idea is that if the sequence is short and it will not change over time,
the controller can be implemented with gates and a state register (SR). In fact, the controller consists of a state
register (SR) i.e. Next State Logic and Output Logic. Here, SR stores the present state of the processor; the Next-

ISSN: 0975-5462 1731


Sadhna.K. Mishra et. al. / International Journal of Engineering Science and Technology
Vol. 2(6), 2010, 1729-1736

State Logic computes the next state to be loaded into the SR, while Output Logic produces the control signals and
the control outputs. Also the SR, Next-State and Output Logic can be redefined and reconfigured if the controller is
implemented on a FPGA [12] as shown in figure 4.
Figure 4: NISC Controller HW [12]

3.2. NISC Controller SW (Programmable implementation)


Figure 5: NISC Controller SW [12]

The Program Counter (PC) is basically State Register in programmable version whereas Output logic is
implemented by a Program Memory (PM). Here, the concept is that PM could be writable if we use a RAM or fixed
if we use a ROM. The next-state logic is replaced by an Address generator [12]. The main characteristic of the
programmable controller is that new program which could be loaded dynamically and performed as shown in figure
5.

4 process methodology
4.1 CISC Methodology
This block diagram is an example of Pentium II Processor with cache and memory interfaces. It explain how the P6
family micro – architecture implements Dynamic execution.

ISSN: 0975-5462 1732


Sadhna.K. Mishra et. al. / International Journal of Engineering Science and Technology
Vol. 2(6), 2010, 1729-1736

Figure 6: The Three Core Engines Interface with Memory [17]

1. FETCH/DECODE stage - this unit takes as input the user program instruction stream from the instruction cache,
and decodes them into a series of μoperations (μops), and this μops represent the dataflow of that instruction stream.
DISPATCH/EXECUTE stage - an out-of-order unit that accepts the dataflow stream, later schedules execution of
the μoperations subject to data dependencies and resource availability and temporarily stores the results of these
speculative executions.
3. RETIRE stage - an in-order unit that knows how and when to retire the temporary, speculative results to
permanent architectural state.
 BUS INTERFACE stage - a partially ordered unit responsible for connecting the three internal units to the real
world, and it directly communicates with the L2 cache supporting up to four concurrent cache accesses[16][17].

4.2 RISC (ARM) Methodology


Processor Operating in programmer’s point of view, the ARM7TDMI® can be in one of two states:
(i) ARM® state which executes 32-bit, word-aligned ARM instructions.
(ii) THUMB® state which operates with 16-bit, halfword-aligned THUMB instructions. In this state, the PC uses bit
1 to select between alternate halfwords.

These two states do not affect the processor mode or the contents of the registers during transition between each
other. Switching from one state to another State e.g. Entering THUMB state can be achieved by executing a BX
instruction with the state Bit (bit 0) set in the operand register. Transition to THUMB state will also occur
automatically on return from an exception (IRQ, FIQ, UNDEF, ABORT, SWI etc.), if the exception was entered
with the processor in THUMB state.
Entry into ARM state (i) On execution of the BX instruction with the state bit clear in the operand register. (ii) On
the processor taking an exception (IRQ, FIQ, RESET, UNDEF, ABORT, SWI etc.). In this case, the PC is placed in
the exception mode’s link register, and execution commences at the exception’s vector address [8] [9].

4.3 NISC Methodology


The first and foremost important feature of NISC methodology is that the application is first translated to low level
three-address operations and then it is profiled and analyzed to extract its important characteristics that can be used
for generating a customized datapath and the process could be executed automatically or by the designer and should
provide information such as (i) depth and structure of pipeline, (ii) level of parallelism, (iii) type and configuration
of components, etc. This information is captured in the NISC model of the architecture and is used to drive the NISC
compiler and simulator. The NISC compiler gets the application and the NISC model of the architecture as the input
and generate the corresponding Finite State Machine with Data (FSMD). Here, each state of the FSMD indicates the

ISSN: 0975-5462 1733


Sadhna.K. Mishra et. al. / International Journal of Engineering Science and Technology
Vol. 2(6), 2010, 1729-1736

set of register and bus transfers as well as the operations that are executed in one clock cycle. Here the compiler
produces this information by translating each operation of the application into a set of RTL and then setting up them
in order to meet the resource and timing constraints [4][10]

Figure 7: The methodology of NISC [10]

In the final implementation stage, the FSMD is converted to an FSM (Moor machine) in which each state represents
the control bit values in each clock cycle and the FSM is implemented in the controller of the NISC processor. Here,
as per the size and complexity of the FSM and the further design constraints, such as area and timing, the controller
can be implemented in two possible ways as below:
(i) For a simple FSMs, the controller can be synthesized using the standard cell libraries and a commercial synthesis
tool, such as Design Compiler, can do the synthesis from the description of the FSM;
(ii)For a complex FSMs, the controller could be applied using a memory and a program counter (PC) and the control
words are stored in the memory and selected by PC.

Further, the FSMD is also used to produce a cycle-accurate simulation model of the architecture and the simulator
gets the sequence of the control words and simulates them on the model of the target NISC processor. Given that
NISC does not have any instruction-set, there will be no functional simulator in the traditional sense of it and the
entire functionality of the application is validated by compiling and running the application itself or the equivalent 3-
address operations. The cycle-accurate simulator can be used for both validating the correctness of the timing and
functionality of the compiler’s output; and providing performance metrics such as speed and energy consumption for
the Model Generator. The performance results of the simulator could be analyzed to fine tune the structure of the
customized NISC processor. [4][10]The entire working of NISC is presented in figure 7. In general, the NISC
processor model plays the pivotal role in this methodology and its structure determines the flexibility of the analyzer
or designer for suggesting more optimized processors and it also influences the quality and complexity of the
simulator and compiler in the system.

5 The NISC Processor


The three prime use of NISC processor are namely (i) translating application perations to register transfers by the
compiler and generating the corresponding FSMD, (ii) constructing the control words from the set of register
transfers, and (iii) decoding the control words back to their corresponding register transfers by simulator. Basically,
NISC processor is a combination of (i) the clock frequency, (ii) model of controller, and (iii) model of Datapath.
Here, the controller could be either fixed or programmable. Further, the Datapath could be both re-programmable,

ISSN: 0975-5462 1734


Sadhna.K. Mishra et. al. / International Journal of Engineering Science and Technology
Vol. 2(6), 2010, 1729-1736

Datapath can be extended or reduced by adding or omitting some components, and re-configurable, Datapath can be
reconnected with the same components [10] as shown in figure 8.
Figure8: The NISC Processor [1,6,11])

The figure 8 shown above is an illustration of a generic NISC architecture. A NISC architecture might have (i)
Control Pipelining i.e. CW and Status register, (ii) Datapath Pipelining i.e. pipelined components or registers at
input/output of components, and (iii) Data Forwarding i.e. the dotted connection lines from output of some
components to input of some others. Here, the control word register (CW) controls both the datapath and the address
generator (AG) of the controller, and the datapath section of CW contains the control values of all datapath
components as well as a small constant field. [1][2][10] At the same time, the controller section of CW determines
how the next PC address is calculated, and it provides a condition, a jump type either (i) direct or (ii) indirect, and an
offset to the AG. For indirect jumps, AG calculated the target address by adding the offset and the current value of
PC, while for direct jumps; AG uses the value on its address port as target address. If the condition in CW and the
status input of the AG are equal, then the calculated target address is loaded into PC otherwise, in other words it is
incremented. Further, In NISC processor, there is a link register (LR) in the controller which stores the return
address of a function call. Here, the return address is usually the value of current PC plus one. Additionally to
standard components, the datapath could have pipelined and multi-cycle components as shown in figure 8, ALU,
MUL and Mem are single-cycle, pipelined and multi-cycle components, respectively. There is no limitation on the
connections of components in the datapath. If the input of a component comes from multiple sources, a Bus or a
Multiplexer is used to select the actual input. The buses are explicitly modeled and we assume one control bit per
each writer to the bus. The multiplexers are implicit and we assume log²n control bits for n writers [6] [7] [10].
6. Conclusion
The review paper explores the recent designing evaluations and related issues and compares the features of NISC
(No Instruction Set Computer), CISC (Complex Instruction Set Computer), and RISC (Reduced Instruction Set
Computer) processors as each processor has its own merits and demerits. In CISC, complex functionalities could be
executed with complex instructions at the same time and the program memory (PM) contains complex instructions
that are mapped to a sequence of micro-codes pre-stored in a microcode memory (mPM). The datapath could
contain no or little decoding with microcodes. One of the CISC examples is Motorola 68000 where microcode
instructions are converted to a sequence of nanocode commands. In CISC machines the control controls the datapath
resources in all cycles. In RISC, there is no need of microcode memory because its instructions are stored in the
program memory and are decoded as they are applied to datapath. Further to these discussions we can conclude that
CISC and RISC are relatively easier to design and compile, but they have larger code sizes. Finally, In NISC there is
no instruction set; its architecture description is very simple and concise. In NISC, the nanocodes are directly
generated and are compressed and every features of the datapath can be efficiently utilized by NISC compiler. If we
think about the difference of RISC and the NISC, the instruction decode stage of RISC is replaced by the
decompression stage in NISC. The decompression stage of NISC is automatically generated without any need for
manual specification. In other words, NISC offers the fastest execution of any computer program and has induced
efficiency and reduced the time to put the product in the market.
References
[1]. B.Gorjiara, M.Reshadi and D. Gajski, Merged Dictionary Code Compression for FPGA Implementation of Custom Microcoded PEs, ACM
Transactions on Reconfigurable Technology and Systems, 2008.
[2]. B. Gorjiara and D. Gajski, Automatic Architecture Refinement Techniques for Customizing Processing Elements, Design Automation
Conference (DAC), June 2008.
[3]. M. Reshadi, B. Gorjara and D. Gajski, C-Based Design Flow: A Case Study on G.729A for Voice over Internet Protocol, Design Automation
Conference (DAC), pp. 72-75, May 2008.
[4]. NISC Technology website: http://www.cecs.uci.edu/~nisc/

ISSN: 0975-5462 1735


Sadhna.K. Mishra et. al. / International Journal of Engineering Science and Technology
Vol. 2(6), 2010, 1729-1736

[5]. The Control Unit, Petru Eles, IDA, LiTH, www.ida.Liu.se/~TDTS01/lectures/08/lec11.pdf


[6]. B. Gorjiara and D. Gajski, FPGA-friendly Code Compression for Horizontal Microcoded Custom IPs, FPGA , pp. 108-115, February
2007.
[7]. M. Reshadi and D.Gajski, Interrupt and Low-level Programming Support for Expanding the Application Domain of Statically-scheduled
Horizontally-microcoded Architectures in Embedded Systems, Design Automation and Test in Europe (DATE), pp. 1337-1342, April 2007.
[8]. ARMDDI01001 ARM Architecture Reference Manual, ARM Limited, ch.1, pp 1-2, 2005. http://www.arm.com/miscpdfs/14128.pdf
[9]. ARM7TDMI Technical Reference Manual, ARM Limited, ch.1, pp 1-26, 2004.
http://infocenter.arm.com/help/topic/com.arm.doc.ddio210c/DDIO210B.pdf,
[10]. M. Reshadi and D. Gajski, NISC Modeling and Compilation, Center for Embedded Computer Systems, TR 04-33, pp 2-7, December 2004.
[11]. M. Reshadi and D. Gajski, NISC Modeling and Simulation, Center for Embedded Computer Systems, TR 04-08, pp. 2-5, March 2004.
[12]. D. Gajski, NISC: The Ultimate Reconfigurable Component, Center for Embedded Computer Systems, TR 03-28, pp. 2-8, October 2003.
[13]. Raj kamal , Embedded Systems, Tata McGraw Hill, publication, 2003 pp. 564-567.
[14]. M. Morris Mano, Computer system Architecture, prentice Hall, India, 2003,ch.8,pp.282-285 .
[15]. ARM7TDMI Data Sheet, Advanced RISC machines LTD. (ARM), ch.1, August 1995. http://www.eecs.umich.edu/`~panalyzer/pdfs/ARM-
doc.pdf
[16]. Intel 386™ EX Embedded Microprocessor, user’s Manual, ch.3, 1996. http://www.intel.com/design/intarch/manuals/27248502.pdf
[17]. Pentium® II Processor, Developer’s manual, ch.2, October 1997. http://www-rocq.inria.fr/syndex/doc/386/vol3ref24319202.pdf
[18]. Pentium® Processor Family, Developer’s Manual, ch.1, 1997. http://webster.cs.ucr.edu/page-TechDocs/pentium4.pdf,
[19]. PA-RISC 1.1 Architecture and Instruction Set, Reference Manual, ch.1, February 1994. http://ftp.parisc-linux.org/docs/arch/pa11-acd.pdf,
[20]. D. E. O’Brien, B. M. Hahne, and J. Peter Krusius, Physical design alternatives for RISC workstation packaging, volume 16, Issue 8 , pp.
996 – 1005, December 1993.
[21]. Paul Chow, RISC-(reduced instruction set computers), IEEE Potentials, volume 10, Issue 3, pp.28-31, October 1991.
[22]. Ymg-Yuch Chcn and T. C. Yang, Modeling and performance evaluation of RISC/B processor, First International Conference on Systems
Integration, pp. 63 – 73, April 1990.
[23]. R. Gupta, M. Epstein and M.Whelan, The design of a RISC based multiprocessor chip, Proceedings of Supercomputing 90, pp.920 – 929,
November 1990.
[24]. M. L. Simmons and H. J. Wasserman, Performance evaluation of the IBM RISC system/6000: comparison of an optimized scalar processor
with two vector processors, Proceedings of Supercomputing 90, pp. 132 – 141, 12-16 November 1990.

ISSN: 0975-5462 1736

You might also like