0% found this document useful (0 votes)
133 views

Computer Architecture

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views

Computer Architecture

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Hitachi's SuperH Architecture

PART I
INTRODUCTION

From the earliest technology discussions that led to the creation of the SuperH®
RISC engine architecture nearly a decade ago, to the development efforts now under way
or planned, there has been one basic engineering and marketing goal for the product line.
The essential parts of that common goal are
• to provide an extended series of upward-compatible microcontroller (MCU) and
microprocessor (MPU) devices
• to offer optimized balances of performance, power consumption, integration and die
size
• to allow customers to take full advantage of windows of
market opportunity
• to deliver economical devices that customers can use to build systems that offer the
price/performance levels needed to achieve high sales volumes.
The four generations of SuperH Cool Engine™ RISC processors currently in
production conform to an aggressive, periodically updated technology roadmap.
Enthusiastic customer response worldwide has earned the architecture a leadership
position worldwide in the 32-bit embedded RISC market.
To supply customers with advanced processors for the products and systems of
the next decade, the SuperH roadmap specifies a fifth-generation architecture
(and,beyond that, a sixth). Development of the fifth-generation architecture was guided
by the overall SuperH series engineering and marketing goal described previously. To
fulfill that goal, given today’s evolving, escalating market requirements, the development
team had to overcome many design challenges. Specifically, they had to create a
microprocessor core that enables nextgenerationsystem-on-a-chip (SOC) consumer
products, provides enhanced performance for multimedia applications, and reduces
customers’ time to market. The Hitachi and STMicroelectronics (ST™) design team
accomplished this and more.
Windows CE supported the Hitachi SuperH-3 and SuperH-4 processors. These
were commonly abbreviated SH-3 and SH-4, or just SH3 and SH4, and the architecture
series was known as SHx. I’ll cover the SH-3 processor in this series, with some nods to
the SH-4 as they arise. But the only binaries I have available for reverse-engineering are
SH-3 binaries, so that’s where my focus will be. The SH-3 is the next step in the processor
series that started with the SH-1 and SH-2. It was succeeded by the SH-4 as well as the
offshoots SH-3e and SH-3-DSP. The SH-4 is probably most famous for being the processor
behind the Sega Dreamcast. As with all the processor retrospective series, I’m going to
focus on how Windows CE used the processor in user mode, with particular focus on the
instructions you will see in compiled code.

The SH-3 is a 32-bit RISC-style (load/store) processor with fixed-length 16-bit


instructions. The small instruction size permits higher code density than its
contemporaries, with Hitachi claiming a code size reduction of a third to a half compared
to processors with 32-bit instructions. The design was apparently so successful that ARM
licensed it for their Thumb instruction set.

The SH-3 can operate in either big-endian or little-endian mode. Windows CE uses
it in little-endian mode.

The SH-3 has sixteen general-purpose integer registers, each 32 bits wide, and
formally named r0 through r15. They are conventionally used as follows:

Register Meaning Preserved?

r0 return value No

r1 No

r2 No
r3 No

r4 argument 1 No

r5 argument 2 No

r6 argument 3 No

r7 argument 4 No

r8 Yes

r9 Yes

r10 Yes

r11 Yes

r12 Yes

r13 Yes

r14, aka fp frame pointer Yes

r15, aka sp stack pointer Yes

We’ll learn more about the conventions when we study calling conventions.

There are actually two sets (banks) of the first eight registers (r0 through r7). User-
mode code uses only bank 0, but kernel mode can choose whether it uses bank 0 or bank
1. (And when it’s using one bank, kernel mode has special instructions available to access
the registers from the other bank.)

The SH-3 does not support floating point operations, but the SH-4 does. There are
sixteen single-precision floating point registers which are architecturally
named fpr0 through fpr15, but which the Microsoft assembler calls fr0 through fr15. They
can be paired up to produce eight double-precision floating point registers:
Double-precision register Single-precision register pair

dr0 fr0 fr1

dr2 fr2 fr3

dr4 fr4 fr5

dr6 fr6 fr7

dr8 fr8 fr9

dr10 fr10 fr11

dr12 fr12 fr13

dr14 fr14 fr15

If you try to perform a floating point operation on an SH-3, it will trap, and the kernel
will emulate the instruction. As a result, floating point on an SH-3 is very slow.

Windows NT requires that the stack be kept on a 4-byte boundary. I did not observe any
red zone.

There are also some special registers:

Register Meaning Preserved? Notes

pc program counter duh instruction pointer, must be even

gbr global base register No bonus pointer register

sr status register No Flags

mach multiply and accumulate high No For multiply-add operations

macl multiply and accumulate low No For multiply-add operations

pr procedure register Yes Return address

Some calling conventions for the SH-3 say that mach and macl are preserved, or that gbr is
reserved, but in Windows CE, they are all scratch.
The architectural names for data sizes are as follows:

 byte: 8-bit value


 word: 16-bit value
 longword: 32-bit value
 quadword: 64-bit value

The SH-3 has branch delay slots. Ugh, branch delay slots. What’s worse is that some
branch instructions have branch delay slots and some don’t.

Instructions on the SH-3 are generally written with source on the left and destination on
the right. For example,

MOV r1, r2 ; move r1 to r2


The SH-3 can potentially retire two instructions per cycle, although internal
resource conflicts may prevent that. For example, an ADD can execute in parallel with a
comparison instruction, but it cannot execute in parallel with a SUB instruction. In the
case of a resource conflict, only one instruction is retired during that cycle.

After an instruction that modifies flags, the new flags are not available for a cycle,
and after a load instruction, the result is not available for two cycles. There are other
pipeline hazards, but those are the ones you are likely to encounter. If you try to use the
results of a prior instruction too soon, the processor will stall. (Don’t forget that the SH-3
is dual-issue, so two cycles can mean up to four instructions.)
HISTORY

The SuperH processor core family was first developed by Hitachi in the early
1990s. Hitachi has developed a complete group of upward compatible instruction
set CPU cores. The SH-1 and the SH-2 were used in the Sega Saturn, Sega
32X and Capcom CPS-3. These cores have 16-bit instructions for better code density than
32-bit instructions, which was a great benefit at the time, due to the high cost of main
memory.

A few years later the SH-3 core was added to the SH CPU family; new features
included another interrupt concept, a memory management unit (MMU) and a modified
cache concept. The SH-3 core also got a DSP extension, then called SH-3-DSP. With
extended data paths for efficient DSP processing, special accumulators and a
dedicated MAC-type DSP engine, this core was unifying the DSP and the RISC processor
world. A derivative was also used with the original SH-2 core.

Between 1994 and 1996, 35.1 million SuperH devices were shipped worldwide.

For the Dreamcast, Hitachi developed the SH-4 architecture. Superscalar (2-way)
instruction execution and a vector floating point unit (particularly suited to 3d graphics)
were the highlights of this architecture. SH-4 based standard chips were introduced
around 1998.

The SH-3 and SH-4 architectures support both big-endian and little-endian byte
ordering (they are bi-endian).

Hitachi and STMicroelectronics started collaborating as early as 1997 on the


design of the SH-4. In early 2001, they formed the IP company SuperH, Inc., which was
going to license the SH-4 core to other companies and was developing the SH-5
architecture, the first move of SuperH into the 64-bit area. In 2003, Hitachi and Mitsubishi
Electric formed a joint-venture called Renesas Technology, with Hitachi controlling 55%
of it. In 2004, Renesas Technology bought STMicroelectronics's share of ownership in the
SuperH Inc. and with it the licence to the SH cores. Renesas Technology later became
Renesas Electronics, following their merger with NEC Electronics.

The SH-5 design supported two modes of operation. SHcompact mode is


equivalent to the user-mode instructions of the SH-4 instruction set. SHmedia mode is
very different, using 32-bit instructions with sixty-four 64-bit integer registers
and SIMD instructions. In SHmedia mode the destination of a branch (jump) is loaded
into a branch register separately from the actual branch instruction. This allows the
processor to prefetch instructions for a branch without having to snoop the instruction
stream. The combination of a compact 16-bit instruction encoding with a more powerful
32-bit instruction encoding is not unique to SH-5; ARM processors have a 16-
bit Thumb mode (ARM licensed several patents from SuperH for Thumb)
and MIPS processors have a MIPS-16 mode. However, SH-5 differs because its backward
compatibility mode is the 16-bit encoding rather than the 32-bit encoding.

The evolution of the SuperH architecture still continues. The latest evolutionary
step happened around 2003 where the cores from SH-2 up to SH-4 were getting unified
into a superscalar SH-X core which forms a kind of instruction set superset of the
previous architectures.

Today, the SuperH CPU cores, architecture and products are with Renesas
Electronics, a merger of the Hitachi and Mitsubishi semiconductor groups and the
architecture is consolidated around the SH-2, SH-2A, SH-3, SH-4 and SH-4A platforms
giving a scalable family
PART II

HITACHI’S SUPERH FEATURES

The SH-5 architecture is designed for efficient execution ofapplications written in


C/C++ and Java. It has the features thatare needed to work with the latest embedded
operating system kernels, including the Windows CE, JavaOS, pSOS, VxWorks, Linux
and OS-9 products. The architecture includes a memory management unit (MMU) and
has both user and privilege modes. There are three programmable vector base registers
for reset, interrupt handling and trap functions. A separate debug vector enables the
nonintrusive debug capability. To implement sophisticated control systems, the CPU
supports 16 levels of interrupt priority and provides a nonmaskable interrupt (NMI). For
improved performance, the SH-5 architecture uses separate offsets for interrupts and TLB
misses. Various CPU mechanisms are provided to improve the
performance of exception handling, interrupt handling and context switching:
• Two 64-bit control registers are provided for the exclusive
use of the operating system. Typically they used to improve
the performance of entry and exit code sequences for exception and interrupt handlers.
Additionally, software conventions may be used to reserve general-purpose registers
for use by the kernel.
• The SH-5’s Applications Binary Interface (ABI) provides one 64-bit control register that
can be used by the kernel to hold a temporary value.
• The floating point unit can be disabled. This allows a kernel to optimize context
switches for threads of execution that do not require floating point operations. In
particular, if either zero threads or exactly one thread uses floating point operation, then
no context saving is needed for the floating point state.
• The CPU maintains “dirty” bits for the general-purpose and
floating-point register sets. One dirty bit is used for each group of 8 consecutive registers,
so there are 8 dirty bits for the general-purpose registers and another 8 dirty bits for the
floating point registers. A dirty bit is set when there is a write to one of the registers in its
group. An operating system can use this information to optimize context switches.

The family of SuperH CPU cores includes:

 SH-1 - used in microcontrollers for deeply embedded applications (CD-


ROM drives, major appliances, etc.)
 SH-2 - used in microcontrollers with higher performance requirements, also used in
automotive such as engine control units or in networking applications, and also in
video game consoles, like the Sega Saturn. The SH-2 has also found home in many
automotive engine control unit applications, including Subaru, Mitsubishi,
and Mazda.
 SH-2A - The SH-2A core is an extension of the SH-2 core including a few extra
instructions but most importantly moving to a superscalar architecture (it is capable
of executing more than one instruction in a single cycle) and two five-stage pipelines.
It also incorporates 15 register banks to facilitate an interrupt latency of 6 clock cycles.
It is also strong in motor control application but also in multimedia, car audio,
powertrain, automotive body control and office + building automation
 SH-DSP - initially developed for the mobile phone market, used later in many
consumer applications requiring DSP performance for JPEG compression etc.
 SH-3 - used for mobile and handheld applications such as the Jornada, strong
in Windows CE applications and market for many years in the car navigation market.
The Cave CV1000, similar to the Sega NAOMI hardware's CPU, also made use of this
CPU. The Korg Electribe EMX and ESX music production units also use the SH-3.
 SH-3-DSP - used mainly in multimedia terminals and networking applications, also
in printers and fax machines
 SH-4 - used whenever high performance is required such as car multimedia
terminals, video game consoles, or set-top boxes
 SH-5 - used in high-end 64-bit multimedia applications
 SH-X - mainstream core used in various flavours (with/without DSP or FPU unit) in
engine control unit, car multimedia equipment, set-top boxes or mobile phones
 SH-Mobile - SuperH Mobile Application Processor; designed to offload application
processing from the baseband LSI

SH-2

Hitachi SH-2 CPU

The SH-2 is a 32-bit RISC architecture with a 16-bit fixed instruction length for high
code density and features a hardware multiply–accumulate (MAC) block for DSP
algorithms and has a five-stage pipeline.

The SH-2 has a cache on all ROM-less devices.

It provides 16 general purpose registers, a vector-base-register, global-base-


register, and a procedure register.

Today the SH-2 family stretches from 32 KB of on-board flash up to ROM-less devices. It
is used in a variety of different devices with differing peripherals such as CAN, Ethernet,
motor-control timer unit, fast ADC and others.

SH-2A

The SH-2A is an upgrade to the SH-2 core. It was announced in early 2006.

New features on the SH-2A core include:

 Superscalar architecture: execution of 2 instructions simultaneously


 Harvard architecture
 Two 5-stage pipelines
 15 register banks for interrupt response in 6 cycles.
 Optional FPU

The SH-2A family today spans a wide memory field from 16 KB up to and includes many
ROM-less variations. The devices feature standard peripherals such
as CAN, Ethernet, USB and more as well as more application specific peripherals such
as motor control timers, TFT controllers and peripherals dedicated to automotive
powertrain applications.

SH-4

Hitachi SH-4 CPU

The SH-4 is a 32-bit RISC CPU and was developed for primary use in multimedia
applications, such as Sega's Dreamcast and NAOMI game systems. It includes a much
more powerful floating point unit and additional built-in functions, along with the
standard 32-bit integer processing and 16-bit instruction size.

SH-4 features include:

 FPU with four floating point multipliers, supporting 32-bit single precision and 64-
bit double precision floats
 4D floating point dot-product operation
 128-bit floating point bus allowing 3.2 GB/sec transfer rate from the data cache
 64-bit external data bus with 32-bit memory addressing, allowing a maximum of 4 GB
addressable memory with a transfer rate of 800 MB/sec
 Built-in interrupt, DMA, and power management controllers

There is no FPU in the custom SH4 made for Casio, the SH7305.

SH-5

The SH-5 is a 64-bit RISC CPU.

Almost no non-simulated SH-5 hardware was ever released, and unlike the still live SH-
4, support for SH-5 was dropped from gcc.

Updated SuperH roadmap


The latest revision of the technology roadmap for the SuperH architecture (right) puts the
key features of the SH-1, SH-2, SH-3 and SH-4 RISC series into perspective. It also shows
the performance targets for the SH-5 RISC engine architecture.

Cooperative development effort


Hitachi developed four generations of the SuperH architecture and the dozens of
MPU/MCU devices in the SH-4, SH-3, SH-2 and SH-1 series. For the fifth-generation
architecture, Hitachi formed a strategic alliance with STMicroelectronics (ST) in
December 1997—a true technology and marketing partnership. The agreement initiated
an in-depth collaboration to develop (using a common design methodology) 64-bit, 700-
to 1000-MIPS SuperH MPUs for applications such as interactive set-top boxes (STBs),
telecom/datacom networks, digital video products, and automotive multimedia systems.
As part of the agreement, ST licensed from Hitachi the SH-4 core to manufacture and
market the ST40-series CPUs. Other current licensees of SuperH technology include
Seiko- Epson, NEL and Sony.
The technology roadmap for the SuperH architecture extends through five
generations of products; a sixth generation is now being planned. The upward-
compatibility gives systemengineers considerable design flexibility. Systems can be
upgraded for higher performance and greater functionality, while investments in
hardware and software development are preserved.

Both companies have leadership positions in key markets. Hitachi is #1 worldwide


in embedded RISC. ST is #1 in digital consumer set-top box CPUs. Both companies expect
a strong positions in future embedded computing markets such as HDTV, digital
imaging, multimedia, broadband networks, cable systems, VoIP equipment, monitors
and displays, and wireless products. Together, the two companies shipped 33 million 32-
bit RISC processors in 1998 (Hitachi shipped 26 million; ST shipped 7 million). Total
shipments of SuperH devices are expected to exceed 100 million by the end of 1999. The
technology /marketing partnership between Hitachi and ST is creating an architectural
standard for embedded systems at the 64-bit level. Design teams are developing the
fifthgeneration architecture in San Jose, CA, with support provided by worldwide
resources of both companies. Other distinguishing features of the partnership include the
• co-development of an advanced 0.15-μm process technology, necessary to meet
aggressive chip speed, power and cost objectives for the fifth-generation architecture and
future SH 4/ST40 products.
• pooling of the companies’ intellectual property, both hardware
and software.
• sharing of development/integration expertise and product
support resources.
• guarantee of full compatibility between the CPUs produced
by both companies.

Additional benefits of the Hitachi–ST alliance


In addition, by combining their extensive expertise in systems software, and by
leveraging their relationships with third-party suppliers, Hitachi and ST will be able to
• provide on-chip debugging capabilities that are powerful,
non-intrusive and cost-effective
• give customers access to a comprehensive span of effective,
time-saving software development tools.
• offer a wide range of software drivers and middleware that
customers can use for product differentiation
• support an exceptionally broad range of operating systems
and third-party application software packages.
The SH-5’s dual-mode instruction set architecture (ISA) gives system engineers
the flexibility to achieve a wide span of design objectives. For example, the dynamic mode
switching allows a compiler to optimize both code density and performance. SHmedia
modes and SHcompact modes can be mixed on boundaries separated by branch
instructions. The SHmedia mode includes in its complete set of 32-bit instructions a set
of SIMD instructions for multimedia applications, including compare, addition,
subtraction and shifts (with and without saturation); fractional multiplication and
multiply accumulate; absolute, sum of difference (for motion estimation); and condition,
move, data conversions and re-arrangement.

PART III

HITACHI’S SUPERH ARCHITECTURES

PART IV

HITACHI’S SUPERH ORGANIZATION

PART V

HITACHI’S SUPERH INSTRUCTION SET ARCHITECTURE


PART VI

HITACHI’S SUPERH SYSTEM DESIGN

REFERENCES

https://en.wikipedia.org/wiki/SuperH

http://segatech.com/technical/cpu/tech_sh4.html

http://segatech.com/technical/cpu/tech_sh4.html

You might also like