0% found this document useful (0 votes)
77 views

TMS320C6713 - Digital Signal Processor

This document provides an overview of the TMS320C6000 digital signal processor (DSP) platform. It discusses the three DSP generations that comprise the C6000 platform - the C62x, C64x, and C67x generations. The C67x DSP is highlighted as featuring floating-point capabilities and an enhanced instruction set. Key features of the C67x DSP that allow for high performance include its advanced VLIW architecture with eight functional units, instruction packing, and conditional execution of instructions.

Uploaded by

santhoshponmathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

TMS320C6713 - Digital Signal Processor

This document provides an overview of the TMS320C6000 digital signal processor (DSP) platform. It discusses the three DSP generations that comprise the C6000 platform - the C62x, C64x, and C67x generations. The C67x DSP is highlighted as featuring floating-point capabilities and an enhanced instruction set. Key features of the C67x DSP that allow for high performance include its advanced VLIW architecture with eight functional units, instruction packing, and conditional execution of instructions.

Uploaded by

santhoshponmathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Chapter 1a

  

The TMS320C6000 digital signal processor (DSP) platform is part of the


TMS320 DSP family. The TMS320C62x DSP generation and the
TMS320C64x DSP generation comprise fixed-point devices in the C6000
DSP platform, and the TMS320C67x DSP generation comprises floating-
point devices in the C6000 DSP platform. All three DSP generations use the
VelociTI architecture, a high-performance, advanced very long instruction
word (VLIW) architecture, making these DSPs excellent choices for multi-
channel and multifunction applications.

The TMS320C67x+ DSP is an enhancement of the C67x DSP with added


functionality and an expanded instruction set.

Any reference to the C67x DSP or C67x CPU also applies, unless otherwise
noted, to the C67x+ DSP and C67x+ CPU, respectively.

Topic Page

1.1 TMS320 DSP Family Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2


1.2 TMS320C6000 DSP Family Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
1.3 TMS320C67x DSP Features and Options . . . . . . . . . . . . . . . . . . . . . . . . 1-4
1.4 TMS320C67x DSP Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7

SPRU733A Introduction 1-1


TMS320
TMS320 DSP
DSP Family
Family Overview
Overview / TMS320C6000 DSP Family Overview

1.1 TMS320 DSP Family Overview


The TMS320 DSP family consists of fixed-point, floating-point, and multipro-
cessor digital signal processors (DSPs). TMS320 DSPs have an architec-
ture designed specifically for real-time signal processing.
Table 1−1 lists some typical applications for the TMS320 family of DSPs. The
TMS320 DSPs offer adaptable approaches to traditional signal-processing
problems. They also support complex applications that often require multiple
operations to be performed simultaneously.

1.2 TMS320C6000 DSP Family Overview


With a performance of up to 6000 million instructions per second (MIPS) and
an efficient C compiler, the TMS320C6000 DSPs give system architects
unlimited possibilities to differentiate their products. High performance, ease
of use, and affordable pricing make the C6000 generation the ideal solution
for multichannel, multifunction applications, such as:
 Pooled modems
 Wireless local loop base stations
 Remote access servers (RAS)
 Digital subscriber loop (DSL) systems
 Cable modems
 Multichannel telephony systems
The C6000 generation is also an ideal solution for exciting new applications;
for example:
 Personalized home security with face and hand/fingerprint recognition

 Advanced cruise control with global positioning systems (GPS) navigation


and accident avoidance
 Remote medical diagnostics

 Beam-forming base stations

 Virtual reality 3-D graphics

 Speech recognition

 Audio

 Radar

 Atmospheric modeling

 Finite element analysis

 Imaging (examples: fingerprint recognition, ultrasound, and MRI)

1-2 Introduction SPRU733A


TMS320C6000 DSP Family Overview

Table 1−1. Typical Applications for the TMS320 DSPs

Automotive Consumer Control


Adaptive ride control Digital radios/TVs Disk drive control
Antiskid brakes Educational toys Engine control
Cellular telephones Music synthesizers Laser printer control
Digital radios Pagers Motor control
Engine control Power tools Robotics control
Global positioning Radar detectors Servo control
Navigation Solid-state answering machines
Vibration analysis
Voice commands

General-Purpose Graphics/Imaging Industrial


Adaptive filtering 3-D transformations Numeric control
Convolution Animation/digital maps Power-line monitoring
Correlation Homomorphic processing Robotics
Digital filtering Image compression/transmission Security access
Fast Fourier transforms Image enhancement
Hilbert transforms Pattern recognition
Waveform generation Robot vision
Windowing Workstations

Instrumentation Medical Military


Digital filtering Diagnostic equipment Image processing
Function generation Fetal monitoring Missile guidance
Pattern matching Hearing aids Navigation
Phase-locked loops Patient monitoring Radar processing
Seismic processing Prosthetics Radio frequency modems
Spectrum analysis Ultrasound equipment Secure communications
Transient analysis Sonar processing

Telecommunications Voice/Speech
1200- to 56Ă600-bps modems Faxing Speaker verification
Adaptive equalizers Future terminals Speech enhancement
ADPCM transcoders Line repeaters Speech recognition
Base stations Personal communications Speech synthesis
Cellular telephones systems (PCS) Speech vocoding
Channel multiplexing Personal digital assistants (PDA) Text-to-speech
Data encryption Speaker phones Voice mail
Digital PBXs Spread spectrum communications
Digital speech interpolation (DSI) Digital subscriber loop (xDSL)
DTMF encoding/decoding Video conferencing
Echo cancellation X.25 packet switching

SPRU733A Introduction 1-3


TMS320C67x DSP Features and Options

1.3 TMS320C67x DSP Features and Options


The C6000 devices execute up to eight 32-bit instructions per cycle. The C67x
CPU consists of 32 general-purpose 32-bit registers and eight functional units.
These eight functional units contain:

 Two multipliers
 Six ALUs

The C6000 generation has a complete set of optimized development tools,


including an efficient C compiler, an assembly optimizer for simplified
assembly-language programming and scheduling, and a Windows based
debugger interface for visibility into source code execution characteristics. A
hardware emulation board, compatible with the TI XDS510 and XDS560
emulator interface, is also available. This tool complies with IEEE Standard
1149.1−1990, IEEE Standard Test Access Port and Boundary-Scan
Architecture.

Features of the C6000 devices include:

 Advanced VLIW CPU with eight functional units, including two multipliers
and six arithmetic units
 Executes up to eight instructions per cycle for up to ten times the
performance of typical DSPs
 Allows designers to develop highly effective RISC-like code for fast
development time

 Instruction packing

 Gives code size equivalence for eight instructions executed serially or


in parallel
 Reduces code size, program fetches, and power consumption

 Conditional execution of all instructions

 Reduces costly branching


 Increases parallelism for higher sustained performance

 Efficient code execution on independent functional units

 Industry’s most efficient C compiler on DSP benchmark suite


 Industry’s first assembly optimizer for fast development and improved
parallelization

 8/16/32-bit data support, providing efficient memory support for a variety


of applications

1-4 Introduction SPRU733A


TMS320C67x DSP Features and Options

 40-bit arithmetic options add extra precision for vocoders and other
computationally intensive applications

 Saturation and normalization provide support for key arithmetic


operations

 Field manipulation and instruction extract, set, clear, and bit counting
support common operation found in control and data manipulation
applications.

The C67x devices include these additional features:

 Hardware support for single-precision (32-bit) and double-precision


(64-bit) IEEE floating-point operations.

 32 × 32-bit integer multiply with 32-bit or 64-bit result.

In addition to the features of the C67x device, the C67x+ device is enhanced
for code size improvement and floating-point performance. These additional
features include:

 Execute packets can span fetch packets.

 Register file size is increased to 64 registers (32 in each datapath).

 Floating-point addition and subtraction capability in the .S unit.

 Mixed-precision multiply instructions.

 32-KByte instruction cache that supports execution from both on-chip


RAM and ROM as well as from external memory through a VBUSP-based
external memory interface (EMIF).

 Unified memory controller features support for flat on-chip data RAM and
ROM organizations for zero wait-state accesses from both load store units
of the CPU. The memory controller supports different banking organiza-
tions for RAM and ROM arrays. The memory controller also supports
VBUSP interfaces (two master and one slave) for transfer of data from the
system peripherals to and from the CPU and internal memory. A VBUSP-
based DMA controller can interface to the CPU for programmable bulk
transfers through the VBUSP slave port.

SPRU733A Introduction 1-5


TMS320C67x DSP Features and Options

The VelociTI architecture of the C6000 platform of devices make them the first
off-the-shelf DSPs to use advanced VLIW to achieve high performance
through increased instruction-level parallelism. A traditional VLIW architecture
consists of multiple execution units running in parallel, performing multiple
instructions during a single clock cycle. Parallelism is the key to extremely high
performance, taking these DSPs well beyond the performance capabilities of
traditional superscalar designs. VelociTI is a highly deterministic architecture,
having few restrictions on how or when instructions are fetched, executed, or
stored. It is this architectural flexibility that is key to the breakthrough efficiency
levels of the TMS320C6000 Optimizing C compiler. VelociTI’s advanced
features include:

 Instruction packing: reduced code size


 All instructions can operate conditionally: flexibility of code
 Variable-width instructions: flexibility of data types
 Fully pipelined branches: zero-overhead branching.

1-6 Introduction SPRU733A


TMS320C67x DSP Architecture

1.4 TMS320C67x DSP Architecture


Figure 1−1 is the block diagram for the C67x DSP. The C6000 devices come
with program memory, which, on some devices, can be used as a program
cache. The devices also have varying sizes of data memory. Peripherals such
as a direct memory access (DMA) controller, power-down logic, and external
memory interface (EMIF) usually come with the CPU, while peripherals such
as serial ports and host ports are on only certain devices. Check the data sheet
for your device to determine the specific peripheral configurations you have.

Figure 1−1. TMS320C67x DSP Block Diagram

Program cache/program memory


32-bit address

ÁÁ Á
256-bit data

Á C6000 CPU

ÁÁÁ
ÁÁÁ
Power Program fetch
down Control

ÁÁ
Instruction dispatch (See Note)
registers
Instruction decode

ÁÁ ÁÁÁÁÁÁ
ÁÁ ÁÁÁÁÁÁ ÁÁÁÁÁÁ
ÁÁÁÁÁ
Data path A Data path B Control
DMA, EMIF

Á Á ÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
Register file A Register file B logic

ÁÁÁ ÁÁ ÁÁÁÁ ÁÁÁÁÁÁ


ÁÁÁÁÁÁ ÁÁÁÁÁ
Test
Emulation

ÁÁÁÁ ÁÁÁÁÁÁ
ÁÁ ÁÁÁÁÁÁ
Á
.L1 .S1 .M1 .D1 .D2 .M2 .S2 .L2
Interrupts

ÁÁÁÁÁ
Á
ÁÁÁÁÁ
ÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
Á ÁÁ Á Additional

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
Á Á
peripherals:

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
Á
Timers,
Data cache/data memory
serial ports,

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
32-bit address
etc.

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
8-, 16-, 32-bit data

SPRU733A Introduction 1-7


TMS320C67x DSP Architecture

1.4.1 Central Processing Unit (CPU)


The C67x CPU, in Figure 1−1, is common to all the C62x/C64x/C67x devices.
The CPU contains:
 Program fetch unit
 Instruction dispatch unit
 Instruction decode unit
 Two data paths, each with four functional units
 32 32-bit registers
 Control registers
 Control logic
 Test, emulation, and interrupt logic
The program fetch, instruction dispatch, and instruction decode units can
deliver up to eight 32-bit instructions to the functional units every CPU clock
cycle. The processing of instructions occurs in each of the two data paths (A
and B), each of which contains four functional units (.L, .S, .M, and .D) and 16
32-bit general-purpose registers. The data paths are described in more detail
in Chapter 2. A control register file provides the means to configure and control
various processor operations. To understand how instructions are fetched,
dispatched, decoded, and executed in the data path, see Chapter 4.

1.4.2 Internal Memory


The C67x DSP has a 32-bit, byte-addressable address space. Internal
(on-chip) memory is organized in separate data and program spaces. When
off-chip memory is used, these spaces are unified on most devices to a single
memory space via the external memory interface (EMIF).
The C67x DSP has two 32-bit internal ports to access internal data memory.
The C67x DSP has a single internal port to access internal program memory,
with an instruction-fetch width of 256 bits.

1.4.3 Memory and Peripheral Options


A variety of memory and peripheral options are available for the C6000
platform:
 Large on-chip RAM, up to 7M bits

 Program cache

 2-level caches

 32-bit external memory interface supports SDRAM, SBSRAM, SRAM,


and other asynchronous memories for a broad range of external memory
requirements and maximum system performance.

1-8 Introduction SPRU733A


TMS320C67x DSP Architecture

 DMA Controller (C6701 DSP only) transfers data between address ranges
in the memory map without intervention by the CPU. The DMA controller
has four programmable channels and a fifth auxiliary channel.

 EDMA Controller performs the same functions as the DMA controller. The
EDMA has 16 programmable channels, as well as a RAM space to hold
multiple configurations for future transfers.

 HPI is a parallel port through which a host processor can directly access
the CPU’s memory space. The host device has ease of access because
it is the master of the interface. The host and the CPU can exchange infor-
mation via internal or external memory. In addition, the host has direct
access to memory-mapped peripherals.

 Expansion bus is a replacement for the HPI, as well as an expansion of


the EMIF. The expansion provides two distinct areas of functionality (host
port and I/O port) which can co-exist in a system. The host port of the
expansion bus can operate in either asynchronous slave mode, similar to
the HPI, or in synchronous master/slave mode. This allows the device to
interface to a variety of host bus protocols. Synchronous FIFOs and
asynchronous peripheral I/O devices may interface to the expansion bus.

 McBSP (multichannel buffered serial port) is based on the standard serial


port interface found on the TMS320C2000 and TMS320C5000
devices. In addition, the port can buffer serial samples in memory auto-
matically with the aid of the DMA/EDNA controller. It also has multichannel
capability compatible with the T1, E1, SCSA, and MVIP networking
standards.

 Timers in the C6000 devices are two 32-bit general-purpose timers used
for these functions:
 Time events
 Count events
 Generate pulses
 Interrupt the CPU
 Send synchronization events to the DMA/EDMA controller.

 Power-down logic allows reduced clocking to reduce power consumption.


Most of the operating power of CMOS logic dissipates during circuit
switching from one logic state to another. By preventing some or all of the
chip’s logic from switching, you can realize significant power savings with-
out losing any data or operational context.

For an overview of the peripherals available on the C6000 DSP, refer to the
TM320C6000 DSP Peripherals Overview Reference Guide (SPRU190).

SPRU733A Introduction 1-9


Chapter 2

       

This chapter focuses on the CPU, providing information about the data paths and
control registers. The two register files and the data cross paths are described.

Topic Page

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2


2.2 General-Purpose Register Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
2.3 Functional Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.4 Register File Cross Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.5 Memory, Load, and Store Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.6 Data Address Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.7 Control Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.8 Control Register File Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23

SPRU733A CPU Data Paths and Control 2-1


Introduction
Introduction / General-Purpose Register Files

2.1 Introduction
The components of the data path for the TMS320C67x CPU are shown in
Figure 2−1. These components consist of:

 Two general-purpose register files (A and B)


 Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2)
 Two load-from-memory data paths (LD1 and LD2)
 Two store-to-memory data paths (ST1 and ST2)
 Two data address paths (DA1 and DA2)
 Two register file data cross paths (1X and 2X)

2.2 General-Purpose Register Files


There are two general-purpose register files (A and B) in the C6000 data paths.
For the C67x DSP, each of these files contains 16 32-bit registers (A0–A15 for
file A and B0–B15 for file B), as shown in Table 2−1. For the C67x+ DSP, the
register file size is doubled to 32 32-bit registers (A0–A31 for file A and B0–B21
for file B), as shown in Table 2−1. The general-purpose registers can be used
for data, data address pointers, or condition registers.

The C67x DSP general-purpose register files support data ranging in size from
packed 16-bit data through 40-bit fixed-point and 64-bit floating point data.
Values larger than 32 bits, such as 40-bit long and 64-bit float quantities, are
stored in register pairs. In these the 32 LSBs of data are placed in an even-
numbered register and the remaining 8 or 32 MSBs in the next upper register
(that is always an odd-numbered register). Packed data types store either four
8-bit values or two 16-bit values in a single 32-bit register, or four 16-bit values
in a 64-bit register pair.

There are 16 valid register pairs for 40-bit and 64-bit data in the C67x DSP
cores. In assembly language syntax, a colon between the register names
denotes the register pairs, and the odd-numbered register is specified first.

The additional registers are addressed by using the previously unused fifth
(msb) bit of the source and register specifiers. All 64-bit register writes and
reads are performed over 2 cycles as per the current C67x devices.

Figure 2−2 shows the register storage scheme for 40-bit long data. Operations
requiring a long input ignore the 24 MSBs of the odd-numbered register.
Operations producing a long result zero-fill the 24 MSBs of the odd-numbered
register. The even-numbered register is encoded in the opcode.

2-2 CPU Data Paths and Control SPRU733A


General-Purpose Register Files

ÁÁÁÁ ÁÁÁÁ
Figure 2−1. TMS320C67x CPU Data Paths

ÁÁÁÁÁÁ src1
ÁÁÁÁ
ÁÁÁÁÁÁ ÁÁÁÁ
ÁÁÁÁÁ .L1 src2
ÁÁÁÁÁ
ÁÁÁÁ dst
8
ÁÁÁÁ
ÁÁÁÁÁ ÁÁÁÁÁ
long dst
long src 8

Á ÁÁÁÁÁ ÁÁÁÁ
32
LD1 32 MSB

ÁÁÁÁÁ ÁÁÁÁ
ST1 8
long src 32 Register

ÁÁÁÁÁ ÁÁÁÁÁ
long dst file A
8 (A0−A15)
Data path A

ÁÁÁÁÁÁ ÁÁÁÁ
dst
.S1
src1

ÁÁÁÁÁÁ src2
ÁÁÁÁÁ
ÁÁÁÁÁ dst
ÁÁÁÁÁ
ÁÁÁÁ
ÁÁÁÁÁÁÁ
.M1 src1

ÁÁÁÁÁ ÁÁÁÁÁ
src2

ÁÁÁÁÁ ÁÁÁÁÁ
LD1 32 LSB
dst

Á ÁÁÁÁÁ .D1 src1


ÁÁÁÁ
ÁÁÁÁ
DA1
src2 2X

ÁÁÁÁÁ 1X
ÁÁÁÁ
DA2
Á ÁÁÁÁ
ÁÁÁÁÁ src2
ÁÁÁÁ
ÁÁÁÁÁ
.D2 src1
dst

ÁÁÁÁÁÁ ÁÁÁÁ
LD2 32 LSB

ÁÁÁÁÁÁ src2
ÁÁÁÁ
ÁÁÁÁÁ ÁÁÁÁÁ
.M2 src1

ÁÁÁÁÁÁ ÁÁÁÁ
dst
Register

ÁÁÁÁÁ ÁÁÁÁÁ
src2 file B
(B0−B15)

ÁÁÁÁ ÁÁÁÁÁ
src1
Data path B .S2
dst

ÁÁÁÁÁ ÁÁÁÁÁ
8
long dst
8

Á ÁÁÁÁ ÁÁÁÁÁ
long src
32
LD2 32 MSB

ÁÁÁÁÁ ÁÁÁÁ
ST2 8
long src 32

ÁÁÁÁ
ÁÁÁÁÁÁ
long dst

.L2
dst
8
ÁÁÁÁÁ
ÁÁÁÁ
ÁÁÁÁÁ ÁÁÁÁ
src2

ÁÁÁÁÁÁ src1
ÁÁÁÁ
Á
Control
register
file

SPRU733A CPU Data Paths and Control 2-3


General-Purpose Register Files

Table 2−1. 40-Bit/64-Bit Register Pairs


Register Files
A B Devices
A1:A0 B1:B0 C67x DSP
A3:A2 B3:B2
A5:A4 B5:B4
A7:A6 B7:B6
A9:A8 B9:B8
A11:A10 B11:B10
A13:A12 B13:B12
A15:A14 B15:B14
A17:A16 B17:B16 C67x+ DSP only
A19:A18 B19:B18
A21:A20 B21:B20
A23:A22 B23:B22
A25:A24 B25:B24
A27:A26 B27:B26
A29:A28 B29:B28
A31:A30 B31:B30

Figure 2−2. Storage Scheme for 40-Bit Data in a Register Pair


31 Odd register 8 7 0 31 Even register 0
Ignored

Á Á
Á Á
Read from registers

39 32 31 0
40-bit data

Á Write to registers
Á
ÍÍÍÍÍÍÍÍÍ Odd register 39 32 31 Even register 0

ÍÍÍÍÍÍÍÍÍ Zero-filled 40-bit data

2-4 CPU Data Paths and Control SPRU733A


Functional Units

2.3 Functional Units


The eight functional units in the C6000 data paths can be divided into two
groups of four; each functional unit in one data path is almost identical to the
corresponding unit in the other data path. The functional units are described
in Table 2−2.
Most data lines in the CPU support 32-bit operands, and some support long
(40-bit) and double word (64-bit) operands. Each functional unit has its own
32-bit write port into a general-purpose register file (Refer to Figure 2−1). All
units ending in 1 (for example, .L1) write to register file A, and all units ending
in 2 write to register file B. Each functional unit has two 32-bit read ports for
source operands src1 and src2. Four units (.L1, .L2, .S1, and .S2) have an
extra 8-bit-wide port for 40-bit long writes, as well as an 8-bit input for 40-bit
long reads. Because each unit has its own 32-bit write port, when performing
32-bit operations all eight units can be used in parallel every cycle.
See Appendix B for a list of the instructions that execute on each functional
unit.

Table 2−2. Functional Units and Operations Performed


Functional Unit Fixed-Point Operations Floating-Point Operations
.L unit (.L1, .L2) 32/40-bit arithmetic and compare operations Arithmetic operations
32-bit logical operations DP → SP, INT → DP, INT → SP
conversion operations
Leftmost 1 or 0 counting for 32 bits
Normalization count for 32 and 40 bits

.S unit (.S1, .S2) 32-bit arithmetic operations Compare


32/40-bit shifts and 32-bit bit-field operations Reciprocal and reciprocal square-root
operations
32-bit logical operations
Absolute value operations
Branches
SP → DP conversion operations
Constant generation
SPand DP adds and subtracts
Register transfers to/from control register
file (.S2 only) SP and DP reverse subtracts (src2 − src1)

.M unit (.M1, .M2) 16 × 16-bit multiply operations Floating-point multiply operations


32 × 32-bit multiply operations Mixed-precision multiply operations

.D unit (.D1, .D2) 32-bit add, subtract, linear and circular Load doubleword with 5-bit constant
address calculation offset
Loads and stores with 5-bit constant offset
Loads and stores with 15-bit constant
offset (.D2 only)

SPRU733A CPU Data Paths and Control 2-5


Register
Register File
File Cross
Cross Paths / Memory, Load, and Store Paths

2.4 Register File Cross Paths


Each functional unit reads directly from and writes directly to the register file
within its own data path. That is, the .L1, .S1, .D1, and .M1 units write to register
file A and the .L2, .S2, .D2, and .M2 units write to register file B. The register
files are connected to the opposite-side register file’s functional units via the
1X and 2X cross paths. These cross paths allow functional units from one data
path to access a 32-bit operand from the opposite side register file. The 1X
cross path allows the functional units of data path A to read their source from
register file B, and the 2X cross path allows the functional units of data path
B to read their source from register file A.

On the C67x DSP, six of the eight functional units have access to the register
file on the opposite side, via a cross path. The .M1, .M2, .S1, and .S2 units’ src2
units are selectable between the cross path and the same side register file. In
the case of the .L1 and .L2, both src1 and src2 inputs are also selectable
between the cross path and the same-side register file.

Only two cross paths, 1X and 2X, exist in the C6000 architecture. Thus, the
limit is one source read from each data path’s opposite register file per cycle,
or a total of two cross path source reads per cycle. In the C67x DSP, only one
functional unit per data path, per execute packet, can get an operand from the
opposite register file.

2.5 Memory, Load, and Store Paths


The C67x DSP has two 32-bit paths for loading data from memory to the regis-
ter file: LD1 for register file A, and LD2 for register file B. The C67x DSP also
has a second 32-bit load path for both register files A and B. This allows the
LDDW instruction to simultaneously load two 32-bit values into register file A
and two 32-bit values into register file B. For side A, LD1a is the load path for
the 32 LSBs and LD1b is the load path for the 32 MSBs. For side B, LD2a is
the load path for the 32 LSBs and LD2b is the load path for the 32 MSBs. There
are also two 32-bit paths, ST1 and ST2, for storing register values to memory
from each register file.

On the C6000 architecture, some of the ports for long and doubleword oper-
ands are shared between functional units. This places a constraint on which
long or doubleword operations can be scheduled on a data path in the same
execute packet. See section 3.7.5.

2-6 CPU Data Paths and Control SPRU733A


Data Address
Data Address Paths / Control Paths
Register File

2.6 Data Address Paths


The data address paths (DA1 and DA2) are each connected to the .D units in
both data paths. This allows data addresses generated by any one path to
access data to or from any register.

The DA1 and DA2 resources and their associated data paths are specified as
T1 and T2, respectively. T1 consists of the DA1 address path and the LD1 and
ST1 data paths. For the C67x DSP, LD1 is comprised of LD1a and LD1b to
support 64-bit loads. Similarly, T2 consists of the DA2 address path and the
LD2 and ST2 data paths. For the C67x DSP, LD2 is comprised of LD2a and
LD2b to support 64-bit loads.

The T1 and T2 designations appear in the functional unit fields for load and
store instructions. For example, the following load instruction uses the .D1 unit
to generate the address but is using the LD2 path resource from DA2 to place
the data in the B register file. The use of the DA2 resource is indicated with the
T2 designation.
LDW .D1T2 *A0[3],B1

2.7 Control Register File


Table 2−3 lists the control registers contained in the control register file.

Table 2−3. Control Registers

Acronym Register Name Section


AMR Addressing mode register 2.7.3

CSR Control status register 2.7.4

ICR Interrupt clear register 2.7.5

IER Interrupt enable register 2.7.6

IFR Interrupt flag register 2.7.7

IRP Interrupt return pointer register 2.7.8

ISR Interrupt set register 2.7.9

ISTP Interrupt service table pointer register 2.7.10

NRP Nonmaskable interrupt return pointer register 2.7.11

PCE1 Program counter, E1 phase 2.7.12

SPRU733A CPU Data Paths and Control 2-7


Control Register File

2.7.1 Register Addresses for Accessing the Control Registers


Table 2−4 lists the register addresses for accessing the control register file.
One unit (.S2) can read from and write to the control register file. Each control
register is accessed by the MVC instruction. See the MVC instruction descrip-
tion, page 3-179, for information on how to use this instruction.

Additionally, some of the control register bits are specially accessed in other
ways. For example, arrival of a maskable interrupt on an external interrupt pin,
INTm, triggers the setting of flag bit IFRm. Subsequently, when that interrupt
is processed, this triggers the clearing of IFRm and the clearing of the global
interrupt enable bit, GIE. Finally, when that interrupt processing is complete,
the B IRP instruction in the interrupt service routine restores the pre-interrupt
value of the GIE. Similarly, saturating instructions like SADD set the SAT
(saturation) bit in the control status register (CSR).

Table 2−4. Register Addresses for Accessing the Control Registers

Acronym Register Name Address Read/ Write


AMR Addressing mode register 00000 R, W

CSR Control status register 00001 R, W

FADCR Floating-point adder configuration 10010 R, W

FAUCR Floating-point auxiliary configuration 10011 R, W

FMCR Floating-point multiplier configuration 10100 R, W

ICR Interrupt clear register 00011 W

IER Interrupt enable register 00100 R, W

IFR Interrupt flag register 00010 R

IRP Interrupt return pointer 00110 R, W

ISR Interrupt set register 00010 W

ISTP Interrupt service table pointer 00101 R, W

NRP Nonmaskable interrupt return pointer 00111 R, W

PCE1 Program counter, E1 phase 10000 R

Legend: R = Readable by the MVC instruction; W = Writeable by the MVC instruction

2-8 CPU Data Paths and Control SPRU733A


Control Register File

2.7.2 Pipeline/Timing of Control Register Accesses


All MVC instructions are single-cycle instructions that complete their access
of the explicitly named registers in the E1 pipeline phase. This is true whether
MVC is moving a general register to a control register, or conversely. In all
cases, the source register content is read, moved through the .S2 unit, and
written to the destination register in the E1 pipeline phase.

Pipeline Stage E1
Read src2

Written dst

Unit in use .S2

Even though MVC modifies the particular target control register in a single
cycle, it can take extra clocks to complete modification of the non-explicitly
named register. For example, the MVC cannot modify bits in the IFR directly.
Instead, MVC can only write 1’s into the ISR or the ICR to specify setting or
clearing, respectively, of the IFR bits. MVC completes this ISR/ICR write in a
single (E1) cycle but the modification of the IFR bits occurs one clock later. For
more information on the manipulation of ISR, ICR, and IFR, see section 2.7.9,
section 2.7.5, and section 2.7.7.

Saturating instructions, such as SADD, set the saturation flag bit (SAT) in CSR
indirectly. As a result, several of these instructions update the SAT bit one full
clock cycle after their primary results are written to the register file. For exam-
ple, the SMPY instruction writes its result at the end of pipeline stage E2; its
primary result is available after one delay slot. In contrast, the SAT bit in CSR
is updated one cycle later than the result is written; this update occurs after two
delay slots. (For the specific behavior of an instruction, refer to the description
of that individual instruction).

The B IRP and B NRP instructions directly update the GIE and NMIE,
respectively. Because these branches directly modify CSR and IER,
respectively, there are no delay slots between when the branch is issued and
when the control register updates take effect.

SPRU733A CPU Data Paths and Control 2-9


Control Register File

2.7.3 Addressing Mode Register (AMR)


For each of the eight registers (A4–A7, B4–B7) that can perform linear or circu-
lar addressing, the addressing mode register (AMR) specifies the addressing
mode. A 2-bit field for each register selects the address modification mode:
linear (the default) or circular mode. With circular addressing, the field also
specifies which BK (block size) field to use for a circular buffer. In addition, the
buffer must be aligned on a byte boundary equal to the block size. The mode
select fields and block size fields are shown in Figure 2−3 and described in
Table 2−5.

Figure 2−3. Addressing Mode Register (AMR)


31 26 25 21 20 16
Reserved BK1 BK0
R-0 R/W-0 R/W-0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
B7 MODE B6 MODE B5 MODE B4 MODE A7 MODE A6 MODE A5 MODE A4 MODE
R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0
Legend: R = Readable by the MVC instruction; W = Writeable by the MVC instruction; -n = value after reset

Table 2−5. Addressing Mode Register (AMR) Field Descriptions


Bit Field Value Description
31−26 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to
this field has no effect.

25−21 BK1 0−1Fh Block size field 1. A 5-bit value used in calculating block sizes for circular
addressing. Table 2−6 shows block size calculations for all 32 possibilities.
Block size (in bytes) = 2 (N+1), where N is the 5-bit value in BK1

20−16 BK0 0−1Fh Block size field 0. A 5-bit value used in calculating block sizes for circular
addressing. Table 2−6 shows block size calculations for all 32 possibilities.
Block size (in bytes) = 2 (N+1), where N is the 5-bit value in BK0

15−14 B7 MODE 0−3h Address mode selection for register file B7.

0 Linear modification (default at reset)

1h Circular addressing using the BK0 field

2h Circular addressing using the BK1 field

3h Reserved

2-10 CPU Data Paths and Control SPRU733A


Control Register File

Table 2−5. Addressing Mode Register (AMR) Field Descriptions (Continued)


Bit Field Value Description
13−12 B6 MODE 0−3h Address mode selection for register file B6.

0 Linear modification (default at reset)

1h Circular addressing using the BK0 field

2h Circular addressing using the BK1 field

3h Reserved

11−10 B5 MODE 0−3h Address mode selection for register file B5.

0 Linear modification (default at reset)

1h Circular addressing using the BK0 field

2h Circular addressing using the BK1 field

3h Reserved

9−8 B4 MODE 0−3h Address mode selection for register file B4.

0 Linear modification (default at reset)

1h Circular addressing using the BK0 field

2h Circular addressing using the BK1 field

3h Reserved

7−6 A7 MODE 0−3h Address mode selection for register file A7.

0 Linear modification (default at reset)

1h Circular addressing using the BK0 field

2h Circular addressing using the BK1 field

3h Reserved

5−4 A6 MODE 0−3h Address mode selection for register file A6.

0 Linear modification (default at reset)

1h Circular addressing using the BK0 field

2h Circular addressing using the BK1 field

3h Reserved

SPRU733A CPU Data Paths and Control 2-11


Control Register File

Table 2−5. Addressing Mode Register (AMR) Field Descriptions (Continued)


Bit Field Value Description
3−2 A5 MODE 0−3h Address mode selection for register file a5.

0 Linear modification (default at reset)

1h Circular addressing using the BK0 field

2h Circular addressing using the BK1 field

3h Reserved

1−0 A4 MODE 0−3h Address mode selection for register file A4.

0 Linear modification (default at reset)

1h Circular addressing using the BK0 field

2h Circular addressing using the BK1 field

3h Reserved

Table 2−6. Block Size Calculations

BKn Value Block Size BKn Value Block Size


00000 2 10000 131 072
00001 4 10001 262 144
00010 8 10010 524 288
00011 16 10011 1 048 576
00100 32 10100 2 097 152
00101 64 10101 4 194 304
00110 128 10110 8 388 608
00111 256 10111 16 777 216
01000 512 11000 33 554 432
01001 1 024 11001 67 108 864
01010 2 048 11010 134 217 728
01011 4 096 11011 268 435 456
01100 8 192 11100 536 870 912
01101 16 384 11101 1 073 741 824
01110 32 768 11110 2 147 483 648
01111 65 536 11111 4 294 967 296

Note: When n is 11111, the behavior is identical to linear addressing.

2-12 CPU Data Paths and Control SPRU733A


Control Register File

2.7.4 Control Status Register (CSR)


The control status register (CSR) contains control and status bits. The CSR
is shown in Figure 2−4 and described in Table 2−7. For the PWRD, EN, PCC,
and DCC fields, see the device-specific data manual to see if it supports the
options that these fields control.

The power-down modes and their wake-up methods are programmed by the
PWRD field (bits 15−10) of CSR. The PWRD field of CSR is shown in
Figure 2−5. When writing to CSR, all bits of the PWRD field should be
configured at the same time. A logic 0 should be used when writing to the
reserved bit (bit 15) of the PWRD field.

Figure 2−4. Control Status Register (CSR)


31 24 23 16
CPU ID REVISION ID
R-0 R-x†

15 10 9 8 7 5 4 2 1 0
PWRD SAT EN PCC DCC PGIE GIE
R/W-0 R/WC-0 R-x R/W-0 R/W-0 R/W-0 R/W-0
Legend: R = Readable by the MVC instruction; W = Writeable by the MVC instruction; WC = Bit is cleared on write; -n = value
after reset; -x = value is indeterminate after reset
† See the device-specific data manual for the default value of this field.

Figure 2−5. PWRD Field of Control Status Register (CSR)


15 14 13 12 11 10
Reserved Enabled or nonenabled interrupt wake Enabled interrupt wake PD3 PD2 PD1
R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0
Legend: R = Readable by the MVC instruction; W = Writeable by the MVC instruction; -n = value after reset

SPRU733A CPU Data Paths and Control 2-13


Control Register File

Table 2−7. Control Status Register (CSR) Field Descriptions


Bit Field Value Description
31−24 CPU ID 0−FFh Identifies the CPU of the device. Not writable by the MVC instruction.

0−1h Reserved

2h C67x CPU

3h C67x+ CPU

4h−FFh Reserved

23−16 REVISION ID 0−FFh Identifies silicon revision of the CPU. For the most current silicon
revision information, see the device-specific data manual. Not writable
by the MVC instruction.

15−10 PWRD 0−3Fh Power-down mode field. See Figure 2−5. Writable by the MVC instruction.

0 No power-down.

1h−8h Reserved

9h Power-down mode PD1; wake by an enabled interrupt.

Ah−10h Reserved

11h Power-down mode PD1; wake by an enabled or nonenabled interrupt.

12h−19h Reserved

1Ah Power-down mode PD2; wake by a device reset.

1Bh Reserved

1Ch Power-down mode PD3; wake by a device reset.

1D−3Fh Reserved

9 SAT Saturate bit. Can be cleared only by the MVC instruction and can be set
only by a functional unit. The set by a functional unit has priority over a
clear (by the MVC instruction), if they occur on the same cycle. The SAT
bit is set one full cycle (one delay slot) after a saturate occurs. The SAT
bit will not be modified by a conditional instruction whose condition is false.

0 Any unit does not perform a saturate.

1 Any unit performs a saturate.

8 EN Endian mode. Not writable by the MVC instruction.

0 Big endian

1 Little endian

2-14 CPU Data Paths and Control SPRU733A


Control Register File

Table 2−7. Control Status Register (CSR) Field Descriptions (Continued)


Bit Field Value Description
7−5 PCC 0−7h Program cache control mode. Writable by the MVC instruction. See the
TMS320C621x/C671x DSP Two-Level Internal Memory Reference Guide
(SPRU609).

0 Direct-mapped cache enabled

1h Reserved

2h Direct-mapped cache enabled

3h−7h Reserved

4−2 DCC 0−7h Data cache control mode. Writable by the MVC instruction. See the
TMS320C621x/C671x DSP Two-Level Internal Memory Reference Guide
(SPRU609).

0 2-way cache enabled

1h Reserved

2h 2-way cache enabled

3h−7h Reserved

1 PGIE Previous GIE (global interrupt enable). Copy of GIE bit at point when
interrupt is taken. Physically the same bit as SGIE bit in the interrupt task
state register (ITSR). Writeable by the MVC instruction.

0 Disables saving GIE bit when an interrupt is taken.

1 Enables saving GIE bit when an interrupt is taken.

0 GIE Global interrupt enable. Physically the same bit as GIE bit in the task state
register (TSR). Writable by the MVC instruction.

0 Disables all interrupts, except the reset interrupt and NMI (nonmaskable
interrupt).

1 Enables all interrupts.

SPRU733A CPU Data Paths and Control 2-15


Control Register File

2.7.5 Interrupt Clear Register (ICR)


The interrupt clear register (ICR) allows you to manually clear the maskable
interrupts (INT15−INT4) in the interrupt flag register (IFR). Writing a 1 to any
of the bits in ICR causes the corresponding interrupt flag (IFn) to be cleared
in IFR. Writing a 0 to any bit in ICR has no effect. Incoming interrupts have
priority and override any write to ICR. You cannot set any bit in ICR to affect
NMI or reset. The ISR is shown in Figure 2−6 and described in Table 2−8.

Note:
Any write to ICR (by the MVC instruction) effectively has one delay slot
because the results cannot be read (by the MVC instruction) in IFR until two
cycles after the write to ICR.
Any write to ICR is ignored by a simultaneous write to the same bit in the
interrupt set register (ISR).

Figure 2−6. Interrupt Clear Register (ICR)


31 16
Reserved
R-0

15 14 13 12 11 10 9 8 7 6 5 4 3 0
IC15 IC14 IC13 IC12 IC11 IC10 IC9 IC8 IC7 IC6 IC5 IC4 Reserved
W-0 R-0
Legend: R = Read only; W = Writeable by the MVC instruction; -n = value after reset

Table 2−8. Interrupt Clear Register (ICR) Field Descriptions

Bit Field Value Description


31−16 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to this
field has no effect.

15−4 ICn Interrupt clear.

0 Corresponding interrupt flag (IFn) in IFR is not cleared.

1 Corresponding interrupt flag (IFn) in IFR is cleared.

3−0 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to this
field has no effect.

2-16 CPU Data Paths and Control SPRU733A


Control Register File

2.7.6 Interrupt Enable Register (IER)


The interrupt enable register (IER) enables and disables individual interrupts.
The IER is shown in Figure 2−7 and described in Table 2−9.

Figure 2−7. Interrupt Enable Register (IER)


31 16
Reserved
R-0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
IE15 IE14 IE13 IE12 IE11 IE10 IE9 IE8 IE7 IE6 IE5 IE4 Reserved NMIE 1
R/W-0 R-0 R/W-0 R-1
Legend: R = Readable by the MVC instruction; W = Writeable by the MVC instruction; -n = value after reset

Table 2−9. Interrupt Enable Register (IER) Field Descriptions

Bit Field Value Description


31−16 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to this
field has no effect.

15−4 IEn Interrupt enable. An interrupt triggers interrupt processing only if the
corresponding bit is set to 1.

0 Interrupt is disabled.

1 Interrupt is enabled.

3−2 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to this
field has no effect.

1 NMIE Nonmaskable interrupt enable. An interrupt triggers interrupt processing only if


the bit is set to 1.
The NMIE bit is cleared at reset. After reset, you must set the NMIE bit to
enable the NMI and to allow INT15−INT4 to be enabled by the GIE bit in CSR
and the corresponding IER bit. You cannot manually clear the NMIE bit; a write
of 0 has no effect. The NMIE bit is also cleared by the occurrence of an NMI.

0 All nonreset interrupts are disabled.

1 All nonreset interrupts are enabled. The NMIE bit is set only by completing a
B NRP instruction or by a write of 1 to the NMIE bit.

0 1 1 Reset interrupt enable. You cannot disable the reset interrupt.

SPRU733A CPU Data Paths and Control 2-17


Control Register File

2.7.7 Interrupt Flag Register (IFR)


The interrupt flag register (IFR) contains the status of INT4−INT15 and NMI
interrupt. Each corresponding bit in the IFR is set to 1 when that interrupt
occurs; otherwise, the bits are cleared to 0. If you want to check the status of
interrupts, use the MVC instruction to read the IFR. (See the MVC instruction
description, page 3-179, for information on how to use this instruction.) The
IFR is shown in Figure 2−8 and described in Table 2−10.

Figure 2−8. Interrupt Flag Register (IFR)


31 16
Reserved
R-0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
IF15 IF14 IF13 IF12 IF11 IF10 IF9 IF8 IF7 IF6 IF5 IF4 Reserved NMIF 0
R-0 R-0 R-0 R-0
Legend: R = Readable by the MVC instruction; -n = value after reset

Table 2−10. Interrupt Flag Register (IFR) Field Descriptions

Bit Field Value Description


31−16 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to this
field has no effect.

15−4 IFn Interrupt flag. Indicates the status of the corresponding maskable interrupt. An
interrupt flag may be manually set by setting the corresponding bit (ISn) in the
interrupt set register (ISR) or manually cleared by setting the corresponding bit
(ICn) in the interrupt clear register (ICR).

0 Interrupt has not occurred.

1 Interrupt has occurred.

3−2 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to this
field has no effect.

1 NMIF Nonmaskable interrupt flag.

0 Interrupt has not occurred.

1 Interrupt has occurred.

0 0 0 Reset interrupt flag.

2-18 CPU Data Paths and Control SPRU733A


Control Register File

2.7.8 Interrupt Return Pointer Register (IRP)


The interrupt return pointer register (IRP) contains the return pointer that
directs the CPU to the proper location to continue program execution after
processing a maskable interrupt. A branch using the address in IRP (B IRP)
in your interrupt service routine returns to the program flow when interrupt
servicing is complete. The IRP is shown in Figure 2−9.

The IRP contains the 32-bit address of the first execute packet in the program
flow that was not executed because of a maskable interrupt. Although you can
write a value to IRP, any subsequent interrupt processing may overwrite that
value.

Figure 2−9. Interrupt Return Pointer Register (IRP)


31 0
IRP
R/W-x
Legend: R = Readable by the MVC instruction; W = Writeable by the MVC instruction; -x = value is indeterminate after reset

SPRU733A CPU Data Paths and Control 2-19


Control Register File

2.7.9 Interrupt Set Register (ISR)

The interrupt set register (ISR) allows you to manually set the maskable inter-
rupts (INT15−INT4) in the interrupt flag register (IFR). Writing a 1 to any of the
bits in ISR causes the corresponding interrupt flag (IFn) to be set in IFR. Writ-
ing a 0 to any bit in ISR has no effect. You cannot set any bit in ISR to affect
NMI or reset. The ISR is shown in Figure 2−10 and described in Table 2−11.

Note:
Any write to ISR (by the MVC instruction) effectively has one delay slot
because the results cannot be read (by the MVC instruction) in IFR until two
cycles after the write to ISR.
Any write to the interrupt clear register (ICR) is ignored by a simultaneous
write to the same bit in ISR.

Figure 2−10. Interrupt Set Register (ISR)

31 16
Reserved
R-0

15 14 13 12 11 10 9 8 7 6 5 4 3 0
IS15 IS14 IS13 IS12 IS11 IS10 IS9 IS8 IS7 IS6 IS5 IS4 Reserved
W-0 R-0
Legend: R = Read only; W = Writeable by the MVC instruction; -n = value after reset

Table 2−11. Interrupt Set Register (ISR) Field Descriptions

Bit Field Value Description


31−16 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to this
field has no effect.

15−4 ISn Interrupt set.

0 Corresponding interrupt flag (IFn) in IFR is not set.

1 Corresponding interrupt flag (IFn) in IFR is set.

3−0 Reserved 0 Reserved. The reserved bit location is always read as 0. A value written to this
field has no effect.

2-20 CPU Data Paths and Control SPRU733A


Control Register File

2.7.10 Interrupt Service Table Pointer Register (ISTP)


The interrupt service table pointer register (ISTP) is used to locate the interrupt
service routine (ISR). The ISTB field identifies the base portion of the address
of the interrupt service table (IST) and the HPEINT field identifies the specific
interrupt and locates the specific fetch packet within the IST. The ISTP is
shown in Figure 2−11 and described in Table 2−12. See section 5.1.2.2 on
page 5-9 for a discussion of the use of the ISTP.

Figure 2−11.Interrupt Service Table Pointer Register (ISTP)


31 16
ISTB
R/W-0

15 10 9 5 4 3 2 1 0
ISTB HPEINT 0 0 0 0 0
R/W-0 R-0 R-0
Legend: R = Readable by the MVC instruction; W = Writeable by the MVC instruction; -n = value after reset

Table 2−12. Interrupt Service Table Pointer Register (ISTP) Field Descriptions

Bit Field Value Description


31−10 ISTB 0−3F FFFFh Interrupt service table base portion of the IST address. This field is cleared
to 0 on reset; therefore, upon startup the IST must reside at address 0. After
reset, you can relocate the IST by writing a new value to ISTB. If relocated,
the first ISFP (corresponding to RESET) is never executed via interrupt
processing, because reset clears the ISTB to 0. See Example 5−1.

9−5 HPEINT 0−1Fh Highest priority enabled interrupt that is currently pending. This field indicates
the number (related bit position in the IFR) of the highest priority interrupt (as
defined in Table 5−1 on page 5-3) that is enabled by its bit in the IER. Thus,
the ISTP can be used for manual branches to the highest priority enabled in-
terrupt. If no interrupt is pending and enabled, HPEINT contains the value 0.
The corresponding interrupt need not be enabled by NMIE (unless it is NMI)
or by GIE.

4−0 − 0 Cleared to 0 (fetch packets must be aligned on 8-word (32-byte) boundaries).

SPRU733A CPU Data Paths and Control 2-21


Control Register File

2.7.11 Nonmaskable Interrupt (NMI) Return Pointer Register (NRP)


The NMI return pointer register (NRP) contains the return pointer that directs
the CPU to the proper location to continue program execution after NMI
processing. A branch using the address in NRP (B NRP) in your interrupt
service routine returns to the program flow when NMI servicing is complete.
The NRP is shown in Figure 2−12.

The NRP contains the 32-bit address of the first execute packet in the program
flow that was not executed because of a nonmaskable interrupt. Although you
can write a value to NRP, any subsequent interrupt processing may overwrite
that value.

Figure 2−12. NMI Return Pointer Register (NRP)


31 0
NRP
R/W-x
Legend: R = Readable by the MVC instruction; W = Writeable by the MVC instruction; -x = value is indeterminate after reset

2.7.12 E1 Phase Program Counter (PCE1)


The E1 phase program counter (PCE1), shown in Figure 2−13, contains the
32-bit address of the fetch packet in the E1 pipeline phase.

Figure 2−13. E1 Phase Program Counter (PCE1)


31 0
PCE1
R-x
Legend: R = Readable by the MVC instruction; -x = value is indeterminate after reset

2-22 CPU Data Paths and Control SPRU733A


Chapter 4



The C67x DSP pipeline provides flexibility to simplify programming and


improve performance. Two factors provide this flexibility:
 Control of the pipeline is simplified by eliminating pipeline interlocks.
 Increased pipelining eliminates traditional architectural bottlenecks in
program fetch, data access, and multiply operations. This provides single-
cycle throughput.
This chapter starts with a description of the pipeline flow. Highlights are:
 The pipeline can dispatch eight parallel instructions every cycle.
 Parallel instructions proceed simultaneously through each pipeline
phase.
 Serial instructions proceed through the pipeline with a fixed relative phase
difference between instructions.
 Load and store addresses appear on the CPU boundary during the same
pipeline phase, eliminating read-after-write memory conflicts.
All instructions require the same number of pipeline phases for fetch and
decode, but require a varying number of execute phases. This chapter
contains a description of the number of execution phases for each type of
instruction.
Finally, the chapter contains performance considerations for the pipeline.
These considerations include the occurrence of fetch packets that contain
multiple execute packets, execute packets that contain multicycle NOPs, and
memory considerations for the pipeline. For more information about fully
optimizing a program and taking full advantage of the pipeline, see the
TMS320C6000 Programmer’s Guide (SPRU198).

Topic Page
4.1 Pipeline Operation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.2 Pipeline Execution of Instruction Types . . . . . . . . . . . . . . . . . . . . . . . . 4-12
4.3 Functional Unit Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-33
4.4 Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-56

SPRU733A Pipeline 4-1


Pipeline Operation Overview

4.1 Pipeline Operation Overview


The pipeline phases are divided into three stages:

 Fetch
 Decode
 Execute

All instructions in the C67x DSP instruction set flow through the fetch, decode,
and execute stages of the pipeline. The fetch stage of the pipeline has four
phases for all instructions, and the decode stage has two phases for all instruc-
tions. The execute stage of the pipeline requires a varying number of phases,
depending on the type of instruction. The stages of the C67x DSP pipeline are
shown in Figure 4−1.

Figure 4−1. Pipeline Stages

Fetch Decode Execute

4.1.1 Fetch

The fetch phases of the pipeline are:

 PG: Program address generate


 PS: Program address send
 PW: Program access ready wait
 PR: Program fetch packet receive

The C67x DSP uses a fetch packet (FP) of eight instructions. All eight of the
instructions proceed through fetch processing together, through the PG, PS,
PW, and PR phases. Figure 4−2(a) shows the fetch phases in sequential order
from left to right. Figure 4−2(b) is a functional diagram of the flow of instructions
through the fetch phases. During the PG phase, the program address is gener-
ated in the CPU. In the PS phase, the program address is sent to memory. In
the PW phase, a memory read occurs. Finally, in the PR phase, the fetch pack-
et is received at the CPU. Figure 4−2(c) shows fetch packets flowing through
the phases of the fetch stage of the pipeline. In Figure 4−2(c), the first fetch
packet (in PR) is made up of four execute packets, and the second and third
fetch packets (in PW and PS) contain two execute packets each. The last fetch
packet (in PG) contains a single execute packet of eight single-cycle instruc-
tions.

4-2 Pipeline SPRU733A


Pipeline Operation Overview

Figure 4−2. Fetch Phases of the Pipeline

CPU
(a) (b)
PG PS PW PR Functional
units

Registers
PR Memory

PS
PG

PW

(c)

Fetch 256

LDW LDW SHR SHR SMPYH SMPYH MV NOP PG

LDW LDW SMPYH SMPY SADD SADD B MVK PS

LDW LDW MVKLH MV SMPYH SMPY B MVK PW

LDW LDW MVK ADD SHL LDW LDW MVK PR

Decode

4.1.2 Decode

The decode phases of the pipeline are:

 DP: Instruction dispatch


 DC: Instruction decode

In the DP phase of the pipeline, the fetch packets are split into execute pack-
ets. Execute packets consist of one instruction or from two to eight parallel
instructions. During the DP phase, the instructions in an execute packet are
assigned to the appropriate functional units. In the DC phase, the the source
registers, destination registers, and associated paths are decoded for the
execution of the instructions in the functional units.

SPRU733A Pipeline 4-3


Pipeline Operation Overview

Figure 4−3(a) shows the decode phases in sequential order from left to right.
Figure 4−3(b) shows a fetch packet that contains two execute packets as they
are processed through the decode stage of the pipeline. The last six instruc-
tions of the fetch packet (FP) are parallel and form an execute packet (EP).
This EP is in the dispatch phase (DP) of the decode stage. The arrows indicate
each instruction’s assigned functional unit for execution during the same cycle.
The NOP instruction in the eighth slot of the FP is not dispatched to a functional
unit because there is no execution associated with it.

The first two slots of the fetch packet (shaded below) represent an execute
packet of two parallel instructions that were dispatched on the previous cycle.
This execute packet contains two MPY instructions that are now in decode
(DC) one cycle before execution. There are no instructions decoded for the .L,
.S, and .D functional units for the situation illustrated.

Figure 4−3. Decode Phases of the Pipeline


(a)
DP DC

(b)
Decode 32 32 32 32 32 32 32 32
ADD ADD STW STW ADDK NOP† DP

MPYH DC MPYH

Functional
.L1 .S1 .M1 .D1 units .D2 .M2 .S2 .L2

† NOP is not dispatched to a functional unit.

4-4 Pipeline SPRU733A


Pipeline Operation Overview

4.1.3 Execute

The execute portion of the pipeline is subdivided into ten phases (E1−E10),
as compared to the five phases in a fixed-point pipeline. Different types of
instructions require different numbers of these phases to complete their
execution. These phases of the pipeline play an important role in your
understanding the device state at CPU cycle boundaries. The execution of dif-
ferent types of instructions in the pipeline is described in section 4.2, Pipeline
Execution of Instruction Types. Figure 4−4(a) shows the execute phases of
the pipeline in sequential order from left to right. Figure 4−4(b) shows the
portion of the functional block diagram in which execution occurs.

Figure 4−4. Execute Phases of the Pipeline

(a) E1 E2 E3 E4 E5 E6 E7 E8 E9 E10

(b)

Execute E1
SADD B SMPY STH STH SMPYH SUB SADD
.L1 .S1 .M1 .D1 .D2 .M2 .S2 .L2

32
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Register file A Register file B
Data 1 32 32 Data 2

Data memory interface control


32 16 16 16 16 32

0 1 2 3 4 5 6 7
8 9
Data address 1 Data address 2

Internal data memory


(byte addressable)

SPRU733A Pipeline 4-5


Pipeline Operation Overview

4.1.4 Pipeline Operation Summary

Figure 4−5 shows all the phases in each stage of the C67x DSP pipeline in
sequential order, from left to right.

Figure 4−5. Pipeline Phases


Fetch Decode Execute

PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 E7 E8 E9 E10

Figure 4−6 shows an example of the pipeline flow of consecutive fetch packets
that contain eight parallel instructions. In this case, where the pipeline is full,
all instructions in a fetch packet are in parallel and split into one execute packet
per fetch packet. The fetch packets flow in lockstep fashion through each
phase of the pipeline.

For example, examine cycle 7 in Figure 4−6. When the instructions from FPn
reach E1, the instructions in the execute packet from FPn +1 are being
decoded. FP n + 2 is in dispatch while FPs n + 3, n + 4, n + 5, and n + 6 are
each in one of four phases of program fetch. See section 4.4, page 4-56, for
additional detail on code flowing through the pipeline. Table 4−1 summarizes
the pipeline phases and what happens in each phase.

Figure 4−6. Pipeline Operation: One Execute Packet per Fetch Packet

Clock cycle
Fetch

ÁÁÁÁ
packet 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

ÁÁÁÁÁÁÁÁÁ
n PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 E7 E8 E9 E10

ÁÁÁÁÁÁÁÁÁÁÁ
n+1 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 E7 E8 E9 E10

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
n+2 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 E7 E8 E9
n+3 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 E7 E8

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
n+4 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 E7

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
n+5 PG PS PW PR DP DC E1 E2 E3 E4 E5 E6

ÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁ
ÁÁÁÁÁ
ÁÁÁÁÁ
ÁÁÁÁÁ
ÁÁÁÁÁ
ÁÁÁÁ
n+6 PG PS PW PR DP DC E1 E2 E3 E4 E5

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
n+7 PG PS PW PR DP DC E1 E2 E3 E4
n+8

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
PG PS PW PR DP DC E1 E2 E3
n+9 PG PS PW PR DP DC E1 E2

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
n+10 PG PS PW PR DP DC E1

4-6 Pipeline SPRU733A


Chapter 59

  

This chapter describes CPU interrupts, including reset and the nonmaskable
interrupt (NMI). It details the related CPU control registers and their functions
in controlling interrupts. It also describes interrupt processing, the method the
CPU uses to detect automatically the presence of interrupts and divert
program execution flow to your interrupt service code. Finally, the chapter
describes the programming implications of interrupts.

Topic Page

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2


5.2 Globally Enabling and Disabling Interrupts . . . . . . . . . . . . . . . . . . . . . 5-11
5.3 Individual Interrupt Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13
5.4 Interrupt Detection and Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16
5.5 Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
5.6 Programming Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22

SPRU733A Interrupts 5-1


Overview

5.1 Overview
Typically, DSPs work in an environment that contains multiple external
asynchronous events. These events require tasks to be performed by the DSP
when they occur. An interrupt is an event that stops the current process in the
CPU so that the CPU can attend to the task needing completion because of
the event. These interrupt sources can be on chip or off chip, such as timers,
analog-to-digital converters, or other peripherals.

Servicing an interrupt involves saving the context of the current process, com-
pleting the interrupt task, restoring the registers and the process context, and
resuming the original process. There are eight registers that control servicing
interrupts.

An appropriate transition on an interrupt pin sets the pending status of the


interrupt within the interrupt flag register (IFR). If the interrupt is properly
enabled, the CPU begins processing the interrupt and redirecting program
flow to the interrupt service routine.

5.1.1 Types of Interrupts and Signals Used


There are three types of interrupts on the CPUs of the TMS320C6000 DSPs.

 Reset
 Maskable
 Nonmaskable

These three types are differentiated by their priorities, as shown in Table 5−1.
The reset interrupt has the highest priority and corresponds to the RESET signal.
The nonmaskable interrupt has the second highest priority and corresponds
to the NMI signal. The lowest priority interrupts are interrupts 4−15
corresponding to the INT4−INT15 signals. RESET, NMI, and some of the
INT4−INT15 signals are mapped to pins on C6000 devices. Some of the
INT4−INT15 interrupt signals are used by internal peripherals and some may
be unavailable or can be used under software control. Check your device-
specific data manual to see your interrupt specifications.

5-2 Interrupts SPRU733A


Overview

Table 5−1. Interrupt Priorities

Priority Interrupt Name Interrupt Type


Highest Reset Reset

NMI Nonmaskable

INT4 Maskable

INT5 Maskable

INT6 Maskable

INT7 Maskable

INT8 Maskable

INT9 Maskable

INT10 Maskable

INT11 Maskable

INT12 Maskable

INT13 Maskable

INT14 Maskable

Lowest INT15 Maskable

5.1.1.1 Reset (RESET)

Reset is the highest priority interrupt and is used to halt the CPU and return
it to a known state. The reset interrupt is unique in a number of ways:

 RESET is an active-low signal. All other interrupts are active-high signals.

 RESET must be held low for 10 clock cycles before it goes high again to
reinitialize the CPU properly.

 The instruction execution in progress is aborted and all registers are


returned to their default states.

 The reset interrupt service fetch packet must be located at address 0.

 RESET is not affected by branches.

SPRU733A Interrupts 5-3


Overview

5.1.1.2 Nonmaskable Interrupt (NMI)

NMI is the second-highest priority interrupt and is generally used to alert the
CPU of a serious hardware problem such as imminent power failure.

For NMI processing to occur, the nonmaskable interrupt enable (NMIE) bit in
the interrupt enable register must be set to 1. If NMIE is set to 1, the only condi-
tion that can prevent NMI processing is if the NMI occurs during the delay slots
of a branch (whether the branch is taken or not).

NMIE is cleared to 0 at reset to prevent interruption of the reset. It is cleared


at the occurrence of an NMI to prevent another NMI from being processed. You
cannot manually clear NMIE, but you can set NMIE to allow nested NMIs.
While NMI is cleared, all maskable interrupts (INT4−INT15) are disabled.

5.1.1.3 Maskable Interrupts (INT4−INT15)

The CPUs of the C6000 DSPs have 12 interrupts that are maskable. These
have lower priority than the NMI and reset interrupts. These interrupts can be
associated with external devices, on-chip peripherals, software control, or not
be available.

Assuming that a maskable interrupt does not occur during the delay slots of
a branch (this includes conditional branches that do not complete execution
due to a false condition), the following conditions must be met to process a
maskable interrupt:

 The global interrupt enable bit (GIE) bit in the control status register (CSR) is
set to1.

 The NMIE bit in the interrupt enable register (IER) is set to1.

 The corresponding interrupt enable (IE) bit in the IER is set to1.

 The corresponding interrupt occurs, which sets the corresponding bit in


the interrupt flags register (IFR) to 1 and there are no higher priority
interrupt flag (IF) bits set in the IFR.

5-4 Interrupts SPRU733A


Overview

5.1.1.4 Interrupt Acknowledgment (IACK) and Interrupt Number (INUMn)

The IACK and INUMn signals alert hardware external to the C6000 that an
interrupt has occurred and is being processed. The IACK signal indicates that
the CPU has begun processing an interrupt. The INUMn signal (INUM3−
INUM0) indicates the number of the interrupt (bit position in the IFR) that is
being processed. For example:

INUM3 = 0 (MSB)
INUM2 = 1
INUM1 = 1
INUM0 = 1 (LSB)

Together, these signals provide the 4-bit value 0111, indicating INT7 is being
processed.

SPRU733A Interrupts 5-5


Chapter 3

Architecture and
Peripherals of TMS320C67x

3.1 Introduction
In the previous chapter we had a glimpse of the general features of, different
generation of processors and their architectures. The TMS320C6x are the first processors
to use velociTI architecture, having implemented the VLIW architecture. The
TMS320C62x is a 16-bit fixed point processor and the ‘67x is a floating point processor,
with 32-bit integer support. The discussion in this chapter is focused on the
TMS320C67x processor. The architecture and peripherals associated with this processor
are also discussed.
In general the TMS320C6x devices execute up to eight 32-bit instructions per cycle.
The ‘67x devices core consist of ‘C6x CPU which has following features.

• Program fetch unit


• Instruction dispatch unit
• Instruction decode unit
• Two data paths, each 32-bit wide and with four functional units
• The functional units consist of two multiplier and six ALUs
• Thirty-two 32-bit registers
• Control registers
• Control logic
• Test, emulation, and interrupt logic.
• Parallel execution of eight instructions.
• 8/16/32-bit data support, providing efficient memory support for a variety of
applications.
• 40-bit arithmetic options add extra precision for computationally intensive
applications.

12
3.2 Architecture of TMS320C67xx
The simplified architecture of TMS320C6713 is shown in the Figure 3.1 below. The
processor consists of three main parts: CPU, peripherals and memory.

Figure 3.1: Simplified block diagram of TMS320C67xx family

3.2.1 Central Processing Unit

The CPU contains program fetch unit, Instruction dispatch unit, instruction
decode unit. The CPU fetches advanced very-long instruction words (VLIW) (256 bits
wide) to supply up to eight 32-bit instructions to the eight functional units during every
clock cycle. The VLIW architecture features controls by which all eight units do not have
to be supplied with instructions if they are not ready to execute. The first bit of every 32-
bit instruction determines if the next instruction belongs to the same execute packet as the
previous instruction, or whether it should be executed in the following clock as a part of
the next execute packet. Fetch packets are always 256 bits wide; however, the execute
packets can vary in size. The variable-length execute packets are a key memory-saving
feature, distinguishing the C67x CPU from other VLIW architectures. The CPU also
contains two data paths (Containing registers A and B respectively) in which the
processing takes place. Each data path has four functional units (.L, .M, .S and .D). The
functional units execute logic, multiply, shifting and data address operation. Figure 3.2
shows the simplified block diagram of the two data paths.

Figure 3.2: TMS320C67X data path

13
All instructions except loads and stores operate on the register. All data transfers
between the register files and memory take place only through two data-addressing units
(.D1 and .D2). The CPU also has various control registers, control logic and test,
emulation and logic. Access to control registers is provided from data path B.

3.2.2 General Purpose Register Files

The CPU contains two general purpose register files A and B. These can be used
for data or as data address pointers. Each file contains sixteen 32-bit registers (A0-A15
for file A and B0-B15 for file B). The registers A1, A2, B0, B1, B2 can also be used as
condition registers. The registers A4-A7 and B4-B7 can be used for circular addressing.
These registers provide 32-bit and 40-bit fixed-point data. The 32-bit data can be stored
in any register. For 40-bit data, processor stores least significant 32 bits in an even
register and remaining 8 bits in upper (odd) register.

3.2.3 Functional Units

The CPU features two sets of functional units. Each set contains four units and a
register file. One set contains functional units .L1, .S1, .M1, and .D1; the other set
contains units .D2, .M2, .S2, and .L2. The two register files each contain sixteen 32-bit
registers for a total of 32 general-purpose registers. The two sets of functional units,
along with two register files, compose sides A and B of the CPU. Each functional unit
has two 32-bit read ports for source operands and one 32-bit write port into a general
purpose register file. The functional units . L1, .S1, .M1, and .D1 write to register file A
and the functional units .L2, .S2, .M2, and .D2 write to register file B. As each unit has its
own 32-bit write port, all eight ports can be used in parallel in every cycle. The .L, .S, and
.M functional units are ALUs. They perform 32-bit/40-bit arithmetic and logical
operations. .S unit also performs branching operations and .D units perform linear and
circular address calculations. Only .S2 unit performs accesses to control register file.
Table 3.1 describes the functional unit along with its description.

3.2.4 Memory System

The memory system of the TMS320C671x series processor implements a


modified Harvard architecture, providing separate address spaces for instruction and data
memory.
The processor uses a two-level cache-based architecture and has a powerful and
diverse set of peripherals. The Level 1 program cache (L1P) is a 4K-byte direct-mapped
cache and the Level 1 data cache (L1D) is a 4K-byte 2-way set-associative cache. The
Level 2 memory/cache (L2) consists of a 256K-byte memory space that is shared
between program and data space. 64K bytes of the 256K bytes in L2 memory can be
configured as mapped memory, cache, or combinations of the two. The remaining 192K
bytes in L2 serve as mapped SRAM.

14
Functional Unit Description
32/40-bit arithmetic and compare operations
Left most 1, 0, bit counting for 32 bits
Normalization count for 32 and 40 bits
.L unit (.L1, .L2)
32 bit logical operations
32/64-bit IEEE floating-point arithmetic
Floating-point/fixed-point conversions
32-bit arithmetic operations
32/40 bit shifts and 32-bit bit-field operations
32 bit logical operations
Branching
.S unit (.S1, .S2) Constant generation
Register transfers to/from the control register file
32/64-bit IEEE floating-point compare operations
32/64-bit IEEE floating-point reciprocal and square root
reciprocal approximation
16 x 16 bit multiplies
32 x 32-bit multiplies
.M unit (.M1, .M2)
Single-precision (32-bit) floating-point IEEE multiplies
Double-precision (64-bit) floating-point IEEE multiplies
.D unit (.D1, .D2) 32-bit add, subtract, linear and circular address calculation

Table 3.1: Functional Units and Descriptions

Figure 3.3 shows the memory structure in CPU of TMS320C67x.


The external memory interface (EMIF) connects the CPU and external memory. This is
discussed in section 3.3.

Figure 3.3: Memory structure in CPU of TMS320C67x

15
3.3 Peripherals of TMS320C6713
The TMS320C67x devices contain peripherals for communication with off-chip
memory, co-processors, host processors and serial devices. The following subsections
discuss the peripherals of ‘C6713 processor.

3.3.1 Enhanced DMA

The enhanced direct memory access (EDMA) controller transfers data between
regions in the memory map without interference by the CPU. The EDMA provides
transfers of data to and from internal memory, internal peripherals, or external devices in
the background of CPU operation. The EDMA has sixteen independently programmable
channels allowing sixteen different contexts for operation.
The EDMA can read or write data element from source or destination location
respectively in memory. EDMA also provides combined transfers of data elements such
as frame transfer and block transfer. Each EDMA channel has an independently
programmable number of data elements per frame and number of frames per block.

The EDMA has following features:

• Background operation: The DMA operates independently of the CPU.

• High throughput: Elements can be transferred at the CPU clock rate.

• Sixteen channels: The EDMA can keep track of the contexts of sixteen
independent transfers.

• Split operation: A single channel may be used simultaneously to perform both


receive and transmit element transfers to or from two peripherals and memory.

• Programmable priority: Each channel has independently programmable priorities


versus the CPU.

• Each channel’s source and destination address registers can have configurable
indexes for each read and write transfer. The address may remain constant,
increment, decrement, or be adjusted by a programmable value.

• Programmable-width transfers: Each channel can be independently configured to


transfer bytes, 16-bit half words, or 32-bit words.

• Authentication: Once a block transfer is complete, an EDMA channel may


automatically reinitialize itself for the next block transfer.

• Linking: Each EDMA channel can be linked to a subsequent transfer to perform


after completion.

16
• Event synchronization: Each channel is initiated by a specific event. Transfers
may be either synchronized by element or by frame.

3.3.2 Host-Port Interface

The Host-Port Interface (HPI) is a 16-bit wide parallel port through which a host
processor can directly access the CPUs memory space. The host device functions as a
master to the interface, which increases ease of access. The host and CPU can exchange
information via internal or external memory. The host also has direct access to memory-
mapped peripherals.
The HPI is connected to the internal memory via a set of registers. Either the host
or the CPU may use the HPI Control register (HPIC) to configure the interface. The host
can access the host address register (HPIA) and the host data register (HPID) to access
the internal memory space of the device. The host accesses these registers using external
data and interface control signals. The HPIC is a memory-mapped register, which allows
the CPU access.
The data transactions are performed within the EDMA, and are invisible to the
user.

3.3.3 External Memory Interface (EMIF)

The external memory interface (EMIF) supports an interface to several external


devices, allowing additional data and program memory space beyond that which is
included on-chip.
The types of memories supported include:

• Synchronous burst SRAM (SBSRAM)

• Synchronous DRAM (SDRAM)

• Asynchronous devices, including asynchronous SRAM, ROM, and FIFOs. The


EMIF provides highly programmable timings to these interfaces.

• External shared-memory devices

There are two data ordering standards in byte-addressable microcontrollers exist:

- Little-endian ordering, in which bytes are ordered from right to left, the most
significant byte having the highest address.

- Big-endian ordering, in which bytes are ordered from left to right, the most
significant byte having the lowest address.

17
The EMIF reads and writes both big- and little-endian devices. There is no distinction
between ROM and asynchronous interface. For all memory types, the address is
internally shifted to compensate for memory widths of less than 32 bits.

3.3.4 Multichannel Buffered Serial Port (McBSP)

The C62x/C67x multichannel buffered serial port (McBSP) is based on the standard
serial port interface found on the TMS320C2000 and C5000 platforms. The standard
serial port interface provides:

• Full-duplex communication

• Double-buffered data registers, which allow a continuous data stream

• Independent framing and clocking for reception and transmission

• Direct interface to industry-standard codecs, analog interface chips (AICs), and


other serially connected A/D and D/A devices

• External shift clock generation or an internal programmable frequency shift clock

• Multichannel transmission and reception of up to 128 channels.

• An element sizes of 8-, 12-, 16-, 20-, 24-, or 32-bit.

• μ-Law and A-Law companding.

• 8-bit data transfers with LSB or MSB first.

• Programmable polarity for both frame synchronization and data clocks.

• Highly programmable internal clock and frame generation.

The Fig 3.4 shows the basic block diagram of McBSP unit.
Data communication between McBSP and the devices interfaced takes place via
two different pins for transmission and reception – data transmit (DX) and data receive
(RX) respectively. Control information in the form of clocking and frame
synchronization is communicated via CLKX, CLKR, FSX, and FSR. 32-bit wide control
registers are used to communicate McBSP with peripheral devices through internal
peripheral bus. CPU or DMA write the DATA to be transmitted to the Data transmit
register (DXR) which is shifted out to DX via the transmit shift register (XSR). Similarly,
receive data on the DR pin is shifted into the receive shift register (RSR) and copied into
the receive buffer register (RBR). RBR is then copied to DRR, which can be read by the
CPU or the DMA controller. This allows internal data movement and external data
communications simultaneously.

18
Figure 3.4: Multichannel Serial Port unit

3.3.5 Timers

The ’C62x/C67x has two 32-bit general-purpose timers that can be used to:
• Time events

• Count events

• Generate pulses

• Interrupt the CPU

• Send synchronization events to the DMA controller

The timer works in one of the two signaling modes depending on whether clocked by
an internal or an external source. The timer has an input pin (TINP) and an output pin
(TOUT). The TINP pin can be used as a general purpose input, and the TOUT pin can be
used as a general-purpose output.
When an internal clock is provided, the timer generates timing sequences to trigger
peripheral or external devices such as DMA controller or A/D converter respectively.
When an external clock is provided, the timer can count external events and interrupt the
CPU after a specified number of events.

19
3.3.6 Multichannel Audio Serial Ports (McASP)

The ‘C6713 processor includes two Multichannel Audio Serial Ports (McASP).
The McASP interface modules each support one transmit and one receive clock zone.
Each of the McASP has eight serial data pins which can be individually allocated to any
of the two zones. The serial port supports time-division multiplexing on each pin from 2
to 32 time slots. The C6713B has sufficient bandwidth to support all 16 serial data pins
transmitting a 192 kHz stereo signal. Serial data in each zone may be transmitted and
received on multiple serial data pins simultaneously and formatted in a multitude of
variations on the Philips Inter-IC Sound (I2S) format, [10].
In addition, the McASP transmitter may be programmed to output multiple
S/PDIF IEC60958, AES-3, CP-430 encoded data channels simultaneously, with a single
RAM containing the full implementation of user data and channel status fields.
The McASP also provides extensive error-checking and recovery features, such as the
bad clock detection circuit for each high-frequency master clock which verifies that the
master clock is within a programmed frequency range.

3.3.7 Power-Down Logic

Most of the operating power of CMOS logic is dissipated during circuit


switching, from one logic state to another. By preventing some or all of the chip’s logic
from switching, significant power savings can be realized without losing any data or
operational context. Power-down mode PD1 blocks the internal clock inputs at the
boundary of the CPU, preventing most of its logic from switching, effectively shutting
down the CPU. Additional power savings are accomplished in power-down mode PD2, in
which the entire on chip clock structure (including multiple buffers) is halted at the
output of the PLL. Power-down mode PD3 shuts down the entire internal clock tree (like
PD2) and also disconnects the external clock source (CLKIN) from reaching the PLL.
Wake-up from PD3 takes longer than wake-up from PD2 because the PLL needs to be
relocked, just as it does following power up.

20
Addressing modes
• Determines how one access memory
• Addressing refers to means to specify location of operands for
instructions
- types of addressing are called addressing modes
- operands may be input operands for the operation as well as
results of the operation

• Addressing modes supported by the TMS320C67x include

register-indirect,
indexed register-indirect,
and modulo addressing (circular addressing).
Immediate data is also supported.
• The TMS320C67x does not support modulo addressing for 64-bit
data.
• Immediate
– The operand is part of the ADD .L1 -13,A1,A6
instruction
• Register
– The operand is specified in a (implied) ADD .L1 A7,A6,A7
register
• Direct
– The address of the operand is not supported
part of the instruction (added
to imply memory page)
• Indirect
– The address of the operand is LDW .L1 *A5++[8],A1
stored in a register
Register-Indirect Addressing
• Operand is located in memory address stored in a register
• Special group of registers can be used to store addresses
(address registers)
• Most important addressing mode in DSPs
• Efficient from instruction set point of view
• Few bits are needed to indicate address of operand
• 32 registers(A0-A15,B0-B15) are used as pointers
• Indirect addressing uses ‘*’ in conjunction with one of the 32
registers
1. *R – register R contains address of a memory location
where a data value is stored
2. *R++ (d) - register R contains memory address
- after the memory address is used, R is
postincremented such that new address is R+1 if d=1
- double minus (- -) update the address by d-1
3. * ++ R(d) - address is preincremented or offset by d

- current address is R+d or R-d


4. * + R(d) - address is preincremented by d, such that the current
address is R+d
- however R pre increments without modification
- unlike previous case, R is not updated or modified
Delay Line implemented with shifting
of sample
Delay Line pointer manipulation using
Circular Addressing
Circular addressing
• Circular addressing is used to create a circular buffer
• Buffer is created in hardware and is very useful for applications like
digital filtering
• This addressing mode in conjunction with circular buffer updates
samples by shifting data without creating overhead as in direct
shifting
• When pointer reaches bottom location, and when incremented the
pointer is automatically wrapped around to the top location.
• Two independent buffers are available using BK0 and BK1 within the
AMR register
• Registers A4-A7 and B4-B7 in conjunction with .D unit can be used as
pointers
• MVC (move constant) is the only instruction to access AMR and other
control registers
Circular Buffer

At the beginning of each


sample period,
a new sample will be read into the
circular buffer,overwriting the
oldest sample.
The newest sample x(n) will be
stored at the memory location
pointed at by auxiliary register
AR(i).
• The need of processing the digital signals in real time,
evolves the concept of Circular Buffering.
• Circular buffers are used to store the most recent values of
a continually updated signal.
• Circular buffering allows processors to access a block of
data sequentially and then automatically wrap around to
the beginning address exactly the pattern used to access
coefficients in FIR filter.
• Circular buffering also very helpful in implementing first-in,
first-out buffers, commonly used for I/O and for FIR delay
lines.
• Most DSP Implement Circular addressing in hardware in
order to conserve memory and minimizing software
overhead.
Addressing Mode Register (AMR)

• For each of the eight registers (A4–A7, B4–B7) that can perform linear
or circular addressing, the addressing mode register (AMR) specifies
the addressing mode.

• A 2-bit field for each register selects the address modification mode:
linear (the default) or circular mode.

• With circular addressing, the field also specifies which BK (block size)
field to use for a circular buffer.
• In addition, the buffer must be aligned on a byte boundary equal to
the block size.
AMR mode and description

Mode description
00 for linear addressing
01 for circular addressing using BK0
• For circular addressing using BK1
• reserved
Block size = 2N+1 bytes

You might also like