0% found this document useful (0 votes)
98 views

UNIT2 Notes

This document discusses digital signal processors (DSPs), specifically the Texas Instruments TMS320 family of DSP chips. It provides details on the architecture and features of several commercial DSP devices from Texas Instruments, Motorola, and Analog Devices. It then focuses on the architecture of the TMS320C54xx DSP chips, describing their bus structure, central processing unit including the arithmetic logic unit, accumulators, barrel shifter, and multiplier. It also discusses the memory structure, addressing modes, and peripheral interfaces of the TMS320C54xx chips.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

UNIT2 Notes

This document discusses digital signal processors (DSPs), specifically the Texas Instruments TMS320 family of DSP chips. It provides details on the architecture and features of several commercial DSP devices from Texas Instruments, Motorola, and Analog Devices. It then focuses on the architecture of the TMS320C54xx DSP chips, describing their bus structure, central processing unit including the arithmetic logic unit, accumulators, barrel shifter, and multiplier. It also discusses the memory structure, addressing modes, and peripheral interfaces of the TMS320C54xx chips.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Digital Signal Processor and Application

UNIT-II
Programmable Digital Signal Processors
1. Introduction:
 In Unit#1, we have studied the basic architecture and algorithms of DSP
processors. Now is the time to dig deeper!
 Leading manufacturers of integrated circuits such as Texas Instruments (TI),
Analog devices & Motorola manufacture the digital signal processor (DSP)
chips. These manufacturers have developed a range of DSP chips with varied
complexity.
 TI DSP processors are of interest to us in this course. Others are also similar,
by studying and understanding one, we can easily relate to the others and their
understanding becomes easier.
 The TMS320 family consists of two types of single chips DSPs: 16-bit fixed
point &32-bit floating-point.
 These DSPs possess the operational flexibility of high-speed controllers and
the numerical capability of array processors

2. Commercial DSP devices:


 Right from the early eighties, when the DSP devices began to appear in the market,
they have been used in numerous applications, such as communication, control,
computers, Instrumentation, and consumer electronics.
 The architectural features and the processing power of these devices have been
constantly upgraded based on the advances in technology and the application needs.
 Most of them have Harvard architecture, a single-cycle hardware multiplier, an
address generation unit with dedicated address registers, special addressing modes,
on-chip peripherals interfaces.
 Of the various families of programmable DSP devices that are commercially
available, the three most popular ones are those from Texas Instruments, Motorola
(Freescale), and Analog Devices.
 Texas Instruments was one of the first to come out with a commercial programmable
DSP with the introduction of its TMS32010 in 1982.
 TMS320C25 introduced in 1985 became very popular, can run at 2x the speed of
32010. At the same time Analog Devices introduced ADS2100 and Motorola
introduced DSP56000.
 Basic architecture remains same, even today, with lots of other enhancements such as
speed, additional features, etc. into these devices. A comparison is presented next.

P.E.S.C.E, Mandya Page 1


Digital Signal Processor and Application

Comparison:
Summary of the Architectural Features of three fixed-Points DSPs
Architectural Features Comparision

Architectural Feature TMS320C25 DSP56000 ADSP2100


Data representation 16bit fixed point 24-bit fixed point 16 bit fixed point
Hardware multiplier 16 by 16 24 by 24 16 by 16
ALU 32 bits 56 bits 40 bits
Internal Buses 16 bit program bus 24 bit program bus 24 bit program bus
16 bit data bus 2x24 bit data buses 16 bit data bus
24 bit global data bus 16 bit result bus
External bus 16bit program/data 24 bit program/data 24 bit program bus
16 bit data bus
On Chip memory 544 words RAM 512 words PROM
4K words ROM 2x256 words DATA RAM
2x256 words DATA RoM
off-chip memory 64K words Program 64K words Program 16K words program
64K words Data 2x64K word data 16K word Data
Cache memory 16 words program
Instruction Cycle time 100 nsec 97.5 nsec 125 nsec
Special Addressing mode Bit Reversed Modulo Modulo
Bit reversed bit reversed
Data address generators 1 2 2
Interfacing features Synch Serial IO Synch & asynch serial IO DMA
DMA IO DMA

3. TMS320C54xx Architecture:

 TMS320C54xx processors retain in the basic Harvard architecture of their


predecessor, TMS320C25, but have several additional features, which improves their
performance.
 They have one program and three data memory spaces with separate buses, which
provide simultaneous accesses to program instruction and two data operands and
enables writing of result at the same time.
 Part of the memory is implemented on-chip and consists of combinations of ROM,
dual-access RAM, and single-access RAM.
 Transfers between the memory spaces are also possible.
 The central processing unit (CPU) of TMS320C54xx processors consists of a 40-bit
arithmetic logic unit (ALU), two 40-bit accumulators, a barrel shifter, a 17x17
multiplier, a 40-bit adder, data address generation logic (DAGEN) with its own
arithmetic unit, and program address generation logic (PAGEN).
 These major functional units are supported by a number of registers and logic in the
architecture.
 Functional block diagram is as shown.
P.E.S.C.E, Mandya Page 2
Digital Signal Processor and Application

Fig: Functional architecture for TMS320C54xx Processor

P.E.S.C.E, Mandya Page 3


Digital Signal Processor and Application

 A powerful instruction set with a hardware-supported,


 single-instruction repeat and
 block repeat operations,
 block memory move instructions,
 instructions that pack two or three simultaneous reads,
 and arithmetic instructions with parallel store and load make these devices
very efficient for running high-speed DSP algorithms.
 Several peripherals, such as a clock generator, a hardware timer, a wait state
generator, parallel I/O ports, and serial I/O ports, are also provided on-chip.
 These peripherals make it convenient to interface the signal processors to the outside
world.
 Let us review and understand the various architectural features of the TMS320C54xx
processors such as bus structure, CPU, Registers, addressing modes, memory space,
program control, etc in detail.

Bus Structure:

Fig: Bus structure for TMS320C54xx


 Processor supports 4 pair of 16-bit Address/Data bus, they are:
 The program bus pair (PAB, PB); which carries the instruction code from the
program memory.
 Three data bus pairs (CAB, CB; DAB, DB; and EAB, EB); which
interconnect the various units within the CPU. In Addition the pair CAB, CB
and DAB, DB are used to read from the data memory. The pair EAB, EB;
carries the data to be written to the memory.
 The ‘54xx can generate up to two data-memory addresses per cycle using the
two auxiliary register arithmetic unit (ARAU0 and ARAU1) in the DAGEN
block. This enables accessing two operands simultaneously.
P.E.S.C.E, Mandya Page 4
Digital Signal Processor and Application

Central Processing Unit (CPU):

The ‘54xx CPU is common to all the ‘54xx devices, the ’54xx CPU contains
 40-bit arithmetic logic unit (ALU);
 Two 40-bit accumulators (ACCA and ACCB);
 Barrel shifter;
 17 x 17-bit multiplier; a 40-bit adder;
 Compare, Select and Store unit (CSSU);
 Exponent encoder(EXP);
 Data Address Generation Unit (DAGEN); and
 Program Address Generation Unit (PAGEN).

 The ALU performs 2’s complement arithmetic operations and bit-level Boolean
operations on 16, 32, and 40-bit words.
 It can also function as two separate 16-bit ALUs and perform two 16-bit operations
simultaneously as shown in the functional diagram.

Fig: Functional diagram of the central processing unit of the TMS32054xx processors

Accumulators:
 The accumulators, ACCA and ACCB, store the output from the ALU or the multiplier
/ adder block.

P.E.S.C.E, Mandya Page 5


Digital Signal Processor and Application

 The accumulators can also provide a second input to the ALU or the multiplier /
adder.
 The bits in each accumulator is grouped as follows:
o Guard bits (bits 32–39)
o A high-order word (bits 16–31)
o A low-order word (bits 0–15)
Instructions are provided for storing the guard bits, the high-order and the low-order
accumulator words in data memory, and for manipulating 32-bit accumulator words in or out
of data memory. Also, any of the accumulators can be used as temporary storage for the
other.

Barrel Shifter:

Fig: Functional diagram of the barrel shifter of the TMS320C54xx

 The ’54x’s barrel shifter has a 40-bit input connected to the accumulator or data
memory (CB, DB) and a 40-bit output connected to the ALU or data memory (EB).
 The barrel shifter produces a left shift of 0 to 31 bits and a right shift of 0 to 16 bits
on the input data.
 The shift requirements are defined in the shift-count field (ASM) of ST1 or defined in
the temporary register (TREG), which is designated as a shift-count register.
 This shifter and the exponent detector normalize the values in an accumulator in a
single cycle.
 The least significant bits (LSBs) of the output are filled with 0s and the most
significant bits (MSBs) can be either zero-filled or sign-extended, depending on the
state of the sign-extended mode bit (SXM) of ST1.

P.E.S.C.E, Mandya Page 6


Digital Signal Processor and Application

 Additional shift capabilities enable the processor to perform numerical scaling, bit
extraction, extended arithmetic, and overflow prevention operations.

Multiplier / Adder:

Fig: Functional diagram of the multiplier /adder unit of TMS320C54xx processor.


 The multiplier / adder performs 17 × 17-bit 2s-complement multiplication with a 40
bit accumulation in a single instruction cycle.
 The multiplier / adder block consists of several elements:
o Multiplier
o Adder,
o Signed/Unsigned input control
o Fractional control
o Zero detector
o Rounder (2s-complement)
o Overflow/Saturation logic, and
o TREG.
 The multiplier has two inputs: one input is selected from the TREG, a data-memory
operand, or an accumulator.
 The other is selected from the program memory, the data memory, an accumulator,

P.E.S.C.E, Mandya Page 7


Digital Signal Processor and Application

or an immediate value.
 The fast on-chip multiplier allows the ’54x to perform DSP operations such as:
convolution, correlation, and filtering efficiently.
 In addition, the multiplier and ALU together execute multiply/accumulate (MAC)
computations and ALU operations in parallel in a single instruction cycle.
 This function is used in determining the Euclid distance, and in implementing
symmetrical and least mean square (LMS) filters, which are required for complex
DSP algorithms.

Compare, Select, and Store Unit (CSSU):


The compare, select, and store unit (CSSU) performs maximum comparisons between the
accumulator’s high and low words, allows the test/control (TC) flag bit of status register 0
(ST0) and the transition (TRN) register to keep their transition histories, and selects the larger
word in the accumulator to be stored in data memory. The CSSU also accelerates Viterbi-
type butterfly computation with optimized on-chip hardware.

Internal Memory and Memory-Mapped Registers:


 The minimum memory address range for the ’54x devices is 192K words —
composed of 64K words in program space, 64K words in data space, and 64K words
in I/O space. Selected devices also provide extended program memory space of up to
8M words.
 The program memory space contains the instructions to be executed as well as tables
used in execution.
 The data memory space stores data used by the instructions.
 The I/O memory space interfaces to external memory-mapped peripherals and can
also serve as extra data storage space.
 The ’54x DSPs provide both on-chip RAM and ROM to improve system performance
and integration.
 All ‘54xx devices contain both RAM and ROM. RAM can be either dual-access type
(DARAM) or single-access type (SARAM).
 The on-chip RAM for these processors is organized in pages having 128 word
locations on each page.
 The ‘54xx processors have a number of CPU registers to support operand addressing
and computations. The CPU registers and peripherals registers are all located on page
0 of the data memory.
 The processors mode status (PMST) registers that is used to configure the processor.
It is a memory-mapped register located at address 1Dh on page 0 of the RAM.
 Usually on chip ROM is used for boot loader and some co-efficient of commonly
used functions to speed up operations.

P.E.S.C.E, Mandya Page 8


Digital Signal Processor and Application

Fig: Internal memory-mapped registers of TMS320C54xx signal processor.

Registers:

Status Register0 diagram

ARP TC C OVA OVB DP


(15-13) (12) (11) (10) (9) (8-0)

 Status registers (ST0,ST1):


 ST0: Contains the status of flags (OVA, OVB, C, TC) produced by arithmetic
operations & bit manipulations.
P.E.S.C.E, Mandya Page 9
Digital Signal Processor and Application

 ST1: Contain the status of various conditions & modes.


 Bits of ST0&ST1registers can be set or clear with the SSBX & RSBX instructions.
 PMST: Contains memory-setup status & control information.
 Because these registers are memory-mapped, they can be stored into and
loaded from data memory; the status of the processor can be saved and
restored for subroutines and interrupt service routines (ISRs).
 ARP: Auxiliary register pointer. TC: Test/control flag.
 C: Carry bit. OVA: Overflow flag for accumulator A. OVB: Overflow flag for
accumulator B. DP: Data-memory page pointer.

Status Register1 diagram

BRAF CPL XM HM INTM O OVM SXM C16 FRCT CMPT ASM


(15) (14) (13) (12) (11) (10) (9) (8) (7) (6) (5) (4-0)

Status Register 1& its bits and their workings:

 BRAF: Block repeat active flag ; BRAF=0, the block repeat is deactivated. BRAF=1,
the block repeat is activated.
 CPL: Compiler mode; CPL=0, the relative direct addressing mode using data page
pointer is selected. CPL=1,the relative direct addressing mode using stack pointer is
selected.
 XF indicates the status of the external flag (XF) pin, which is a general purpose output
pin. The SSBX instruction can set XF and the RSBX instruction can reset XF.
 HM: Hold mode, indicates whether the processor continues internal execution or
acknowledge for external interface.
 INTM: Interrupt mode, it globally masks or enables all interrupts. INTM=0 all
unmasked interrupts are enabled. INTM=1all masked interrupts are disabled.
 OVM: Overflow mode. OVM=1the destination accumulator is set either the most
positive value or the most negative value & =0 the overflowed result is in destination
accumulator.
 SXM: Sign extension mode. SXM=0 Sign extension is suppressed. SXM=1Data is
sign extended
 C16: Dual 16 bit/double-Precision arithmetic mode. C16=0ALU operates in double-
Precision arithmetic mode. C16=1ALU operates in dual 16-bit arithmetic mode.
 FRCT: Fractional mode. FRCT=1 the multiplier output is left-shifted by 1bit to
compensate an extra sign bit. At reset this bit is ZERO.
 CMPT: Compatibility mode. CMPT=0 ARP is not updated in the indirect addressing
mode. CMPT=1ARP is updated in the indirect addressing mode.
 ASM: Accumulator Shift Mode. The 5-bit ASM field specifies a shift value within a
–16 through 15 range and is coded as a 2s-complement value. Instructions with a
parallel store, as well as STH, STL, ADD, SUB, and LD, use this shift capability.
ASM can be loaded from data memory or by the LD instruction using a short-
immediate operand.
 Register bit with ZERO is for future expansion. Always read as 0

P.E.S.C.E, Mandya Page 10


Digital Signal Processor and Application

PMST Register:

 The PMST register is loaded with memory-mapped register instructions such as STM.
 IPTR: Interrupt vector pointer, point to the 128-word program page where the
interrupt vectors reside. 0 to 1FF; that is 512 locations, each vector is 4byte wide.
 MP/MC: Microprocessor/Microcomputer mode, MP/MC=0, the on chip ROM is
enabled. MP/MC=1, the on chip ROM is NOT enabled.
 OVLY: RAM OVERLAY, OVLY enables on chip dual access data RAM blocks to
be mapped into program space.
 AVIS: It enables/disables the internal program address to be visible at the address
pins.
 DROM: Data ROM, DROM enables on-chip ROM to be mapped into data space.
 CLKOFF: CLOCKOUT off.
 SMUL: Saturation on multiplication.
 SST: Saturation on store.

General information:
 In the C54x DSP, the data and program memories are organized in 16-bit words.
Data busses have a 16-bit width.
 Data and instructions are generally of size N=16 bits.
 Some instructions may take multiple of 16-bits.
 Some data operands may be double precision and occupy 2 words.
 Internal busses: 2 data read, 1 data write
 External Buses: Data buses =2 ; Program Bus=1; result bus =1
 Data addressing modes provide various ways to access operands to execute
instructions and place results in the memory or the registers.

Data Addressing modes:


 The 54XX devices offer seven basic addressing modes; we will look into each one of
these addressing modes in detail with examples.
 1. Immediate addressing: instruction contains the operand.
 2. Absolute addressing: instruction contains the specific address
 3. Accumulator addressing: Accumulator content is used as address
 4. Direct addressing: Part of the instruction + base address provide the 16 bit address.
 5. Indirect addressing: Address is contained in the auxiliary register. Circular
addressing and dual operand addressing come under this category.
 6. Memory mapped addressing: Memory mapped registers are used for addressing the
memory locations
 7. Stack addressing: Stack pointer is used for addressing the memory locations

Immediate Addressing:
Instruction contains the value of the operand. Value is preceded by #.
 Example: ADD #4, A Add the value 4 to the content of accumulator A.

 Useful for initializations.


 Long (16 bits) or short values (3,5,8 or 9 bit in length):

P.E.S.C.E, Mandya Page 11


Digital Signal Processor and Application

 For long values: instruction uses 2 words.

Ex: RPT #0FFFFh ; Value 0FFFFh moves to the Repeat counter RC


Example:

STM #1234h,AR2 Load AR2 with the value 1234h.


LD #6, DP Load DP with the value 6.

STM: Store immediate value into MMR, LD: Load data

 Absolute Addressing: The instruction contains a specified address in the operand. There are
four types of absolute addressing:
• The address can be in data, program or IO memory. 16 bits. so
instructions that encode absolute addresses are always at least two
words in length.
 Data-memory address (dmad) addressing: MVDK Smem(single data
memory), dmad ; MVDM dmad, MMR ; MVKD dmad, Smem; MVMD
MMR, dmad
• Data memory addressing Example
– MVKD 1000h, *AR5; move data from memory 1000h(source)
to data memory pointed by AR5.
• Program memory addressing
– MVPD 1000h, *AR7; Move word from program memory at
address 1000h to data memory at address memory pointed by
AR7
 Port address (PA) addressing:
– PORTR 05h, *AR3 ; Reads a 16 bit value from an external I/O
port at address 05H to the data memory location pointed by
AR3.
 Location in the data space *(lk) addressing is used with all instructions that
support the use of a single data-memory (Smem) operand.
• LD *(1000h), A ; specify the exact 16bit address *(1000h)

 Accumulator Addressing: Accumulator addressing uses the value in the


accumulator as an address.
 This addressing mode is used to address program memory as data.
 Two instructions allow you to use the accumulator as an address:
 READA SmemREADA *AR2
 WRITA Smem
 READA transfers a word from a program-memory location specified by accumulator
A to a data-memory location specified by the single data-memory (Smem) operand of
the instruction.
 WRITA transfers a word from a data-memory location specified by the Smem
operand of the instruction to a program-memory location specified by accumulator A.
 In repeat mode, an increment may be used to increment accumulator A.

 Direct Addressing: In direct addressing, the instruction contains the lower seven
bits of the data memory address (dma).

P.E.S.C.E, Mandya Page 12


Digital Signal Processor and Application

 The 7-bit dma is an address offset that is combined with a base address, with the data-
page pointer (DP), or with the stack pointer (SP) to form a 16-bit data-memory
address.
 Using this form of addressing, you can access any of 128 locations in random order
without changing the DP or the SP.
 Direct addressing is not the only method of offset addressing. However, the
advantage of this mode is that it encodes each instruction and address into a
single word.
 Either DP or SP can be combined with the dma offset to generate the actual address.
The compiler mode bit (CPL), located in status register ST1, selects which method is
used to generate the address:
 When CPL = 0, the dma field is concatenated with the 9-bit DP field to form
the 16-bit data-memory address.
 When CPL = 1, the dma field is added (positive offset) to SP to form the 16-
bit data-memory address.

 The syntax for direct addressing uses a symbol or a number to specify the offset
value.
 The 16-bit address of the data
Memory location is formed by combining the lower 7 bits of the data memory address
contained
In the instruction .

P.E.S.C.E, Mandya Page 13


Digital Signal Processor and Application

 For example, to add the contents of the memory location SAMPLE to accumulator B,
provided that the correct base address is in DP (CPL = 0) or SP (CPL = 1), you would
write:
 ADD SAMPLE, B
 The lower seven bits of the address of SAMPLE are stored in the instruction
word.
 LD #4, DP;
 ADD=0, B
 When CPL=0, to add the contents of the memory location 0 on page 4 in the
data memory to accumulator B, the above program sequence is used.

 Indirect Addressing: In indirect addressing, any location in the 64K-word data


space can be accessed using the 16-bit address contained in an auxiliary register.
 The C54xx DSP has eight 16-bit auxiliary registers (AR0–AR7). Indirect addressing
is used mainly when there is a need to step through sequential locations in memory in
fixed-size steps.
 Two auxiliary register arithmetic units (ARAU0 & ARAU1) which is used for
indexed and bit reverse addressing modes.
 For single – operand addressing
 ARP depends on (CMPT) bit in ST1
 CMPT = 0, Standard mode, ARP is not updated in the indirect
addressing mode i.e ARP set to zero
 CMPT = 1, Compatibility mode, ARP is updated in the indirect
addressing mode In some cases, two data operands can be fetched at
once, that is using Dual-Operand Address Modifications.
 Block diagram is as shown

 Two auxiliary register arithmetic units (ARAU0 and ARAU1) operate on the contents
of the auxiliary registers (AR’s). The ARAUs perform unsigned, 16-bit auxiliary
register arithmetic operations.

P.E.S.C.E, Mandya Page 14


Digital Signal Processor and Application

 As the figure shows, the main components used for address generation in indirect
addressing are the auxiliary register arithmetic units (ARAU0 and ARAU1) and the
auxiliary registers (AR0–AR7).
 You can modify the addresses you use in instructions before or after they are
accessed, or you can leave them unchanged.
 You can modify them by incrementing or decrementing the address by 1, adding a 16-
bit offset (lk), or indexing with the value in AR0.
 Table lists the types of single data-memory operand addressing with explanations.

 While an AR is being used, you can modify the AR by incrementing or decrementing


its value.

P.E.S.C.E, Mandya Page 15


Digital Signal Processor and Application

 Offset addressing is a type of indirect addressing in which a predetermined offset, or


step size, is added to the contents of an auxiliary register.
 Instructions using offset addressing cannot be repeated using the repeat single
instruction.

 Indexed addressing is a type of indirect addressing in which the contents of AR0


are added to, or subtracted from, any other auxiliary register, ARx.
 Indexed addressing differs from offset addressing in that the index or step size can be
determined during code execution.
 Because the index is determined during code execution, you can easily make
adjustments to the step size.
 Indexed addressing also offers an advantage over offset addressing: it does not require
an additional word for the instruction.
 Many algorithms, such as convolution, correlation, and FIR filters, require the
implementation of a circular buffer in memory.
 In these algorithms, a circular buffer is a sliding window containing the most recent
data. As new data comes in, the buffer overwrites the oldest data. The key to the
implementation of a circular buffer is the implementation of circular addressing.
 Circular buffer of size R must start on a N-bit boundary, where 2N > R .
 The circular buffer size register (BK): specifies the size of circular buffer.
 Effective base address (EFB): By zeroing the N LSBs of a user selected AR (ARx).
 End of buffer address (EOB) : By replacing the N LSBs of ARx with the N LSBs of
BK.
 The algorithm for circular addressing is as follows:
 If 0≤ index + step < BK: index = index + step.
 Else if index + step ≥ BK: index = index + step – BK.
 Else if index + step < 0: index = index + step + BK
 Circular addressing can be used for single data-memory or dual data-memory
operands. When BK is zero, the circular modifier results in no circular address
modification.
 This is especially useful when a dual operand must perform an address modification
equivalent to ARx+0.
 Figure 1 illustrates the relationships among BK, the auxiliary register (ARx), the
bottom of the circular buffer, the top of the circular buffer, and the index into the
circular buffer.
 Figure 2 shows how the circular buffer is implemented and illustrates the relationship
between the generated values and the elements in the circular buffer.

P.E.S.C.E, Mandya Page 16


Digital Signal Processor and Application

 Circular addressing typically uses a decrement or an increment by one (MOD


= 8 and 10) or a decrement or an increment by an index (MOD = 9 and11).
 Pre modification by a 16-bit word offset (*+ARx(lk)%) requires an extra code
word so that the instruction code has two or three words. The last word is the
offset.
 An instruction using indirect-offset addressing cannot be Repeated using a
single repeat operation

 Bit reversed Addressing:


 Bit-reversed addressing enhances execution speed and program
memory for FFT algorithms that use a variety of radixes.

P.E.S.C.E, Mandya Page 17


Digital Signal Processor and Application

 In this addressing mode, AR0 specifies one half of the size of the FFT.
 The value contained in AR0 must be equal to 2N –1, where N is an
integer, and the FFT size is 2N.
 An auxiliary register points to the physical location of a data value.
 When you add AR0 to the auxiliary register using bit-reversed
addressing, the address is generated in a bit-reversed fashion, with the
carry bit propagating from left to right, instead of the normal right to
left.
 Used for FFT algorithms.

 Dual-Operand Address Modifications:


 Dual data-memory operand addressing is used for instructions that perform
two reads or a single read and a parallel store (indicated by two vertical bars,
||) at the same time.

 These instructions are all one word long and operate in indirect addressing
mode only.

 Two data-memory operands are represented by Xmem and Ymem:

• Xmem is a read operand with access through the D bus. Store


instructions, for example STH and STL with shift operation, change
Xmem to a write operand.

• Ymem is used as a read operand in instructions with dual reads


(accessed through the C bus) or as a write operand in instructions with
a parallel store (accessed through the E bus).

 If the source operand and the destination operand point to the same location, in
instructions with a parallel store (for example, ST||LD), the source is read
before writing to the destination.

 If a dual-operand instruction (for example, ADD) points to the same auxiliary


register with different addressing modes specified for both operands, the mode
defined by the Xmod field is used for addressing.

 Only 2 bits are available in the instruction code for selecting each auxiliary register in
this mode.

 Thus, just four of the auxiliary registers, AR2-AR5, can be used, The ARAUs
together with these registers, provide capability to access two operands in a single
cycle.

P.E.S.C.E, Mandya Page 18


Digital Signal Processor and Application

P.E.S.C.E, Mandya Page 19


Digital Signal Processor and Application

Xmod or Operand Function Description


Ymod Field Syntax
00(0) *AR2 addr=AR2 AR2 is data memory address
01(1) *AR3- Addr=AR3 After access, the address in AR3 is
AR3= AR3-1 decremented.
10(2) *AR4+ Addr=AR4 After access, the address in AR4 is
AR4=AR4+1 incremented
11()3 *AR5+0% Addr=AR5 After access, AR0 is added to AR5 using
AR5=circ(AR5+AR0) circular addressing

 In each case, the content of the auxiliary register is used as the data-memory operand.
 After using the address in the auxiliary register, the ARAUs perform the specified
mathematical operation.

Memory-mapped register addressing: is used to modify the memory-mapped


registers without affecting either the current data-page pointer (DP) value or the current
stack-pointer (SP) value.
 Because DP and SP do not need to be modified in this mode, the overhead for writing
to a register is minimal.
 Memory-mapped register addressing works for both direct and indirect addressing.
 In addition to registers, any scratch-pad RAM located on data page 0 can be modified
by using memory-mapped register addressing.
 Figure shows how memory-mapped addresses are generated.
 Addresses are generated by:

• Forcing the nine most significant bits (MSBs) of data- memory address to 0,
regardless of the current value of DP or SP when direct addressing is used
• Using the seven LSBs of the current auxiliary register value when indirect
addressing is used

P.E.S.C.E, Mandya Page 20


Digital Signal Processor and Application

Stack Addressing: Stack is used to automatically store the PC during the subroutines and
interrupts. It can also be used to store data values or context at the programmer’s discretion.
 The stack is filled from the highest to the lowest memory address. The processor uses
a 16-bit memory mapped register, the stack pointer (SP), to address the stack. SP
always points to the last element stored onto the stack.
 Four instructions access the stack using the stack addressing mode:
 PSHD pushes a data-memory value onto the stack.
 PSHM pushes a memory-mapped register onto the stack.
 POPD pops a data-memory value from the stack.
 POPM pops a memory-mapped register from the stack.
 Other operations also affect the stack and the stack pointer. The stack is used during
interrupts and subroutines to save and restore the PC contents.

 Fig shows an example of the stack and SP before and after a push of X2 into the stack
(PSHD X2).
 When a subroutine is called or an interrupt occurs, the return address is automatically
saved in the stack using a push operation. Instructions used for subroutine calls and
interrupts are CALA[D], CALL[D], CC[D], INTR, and TRAP.
 When a subroutine returns, the return address is retrieved from the stack using a pop
operation and loaded into the PC. Instructions used for returns from subroutines are
RET[D], RETE[D], RETEF[D], and RC[D].
 The FRAME instruction also affects the stack. This instruction adds a short
immediate offset to the stack pointer.

Memory Space of TMS320C54xx Processors:

P.E.S.C.E, Mandya Page 21


Digital Signal Processor and Application

 The C54xE DSP memory is organized into three individually selectable spaces:
program, data, and I/O.
 Within any of these spaces, RAM, ROM, EPROM, EEPROM, or memory-
mapped peripherals can reside either on-chip or off-chip.
 Addressability is a total of 128k words extendable up to 8192k words.
 Data memory: To store data required to run programs & for external memory mapped
registers.
 Program memory: To store program instructions &tables used in the execution of
programs.

Program Control:
 : The PC is loaded with the address of the appropriate interrupt vector.
 Instructions such as BACC, CALA, etc ;The PC is loaded with the contents of
the accumulator low word
 End of a block rIt contains program counter (PC), the program counter related H/W,
hardware stack, repeat counters &status registers.
 PC addresses memory in several ways namely:
 Branch: The PC is loaded with the immediate value following the branch
instruction
 Subroutine call: The PC is loaded with the immediate value following the call
instruction
 Interruptepeat loop: The PC is loaded with the contents of the block repeat
program address start register.
 Return: The PC is loaded from the top of the stack.

Instruction and Programming:


 This section introduces some commonly used C54x instructions.
 The C54x instruction set can be divided into five types of operation:
 Load and store operation
 Arithmetic operation
 Logical operation
 Program-control operation
 Special instructions
 Load /Store Operations :
 The basic operation for moving data are LD(load),LDM(load
MMR),ST(store),STM(store MMR),STL(store accumulator low into memory), and
STH(store accumulator high into memory).
 These instructions can perform the necessary data movement in the processor using
any of the addressing modes.
 LD #2,DP; the DP is loaded with the constant 2 to point at page 2.
 LDM AR1,A; the content of AR1 are loaded into ACC A
 Parallel load and multiply instruction; Example: LD||MAC
 Parallel store and add, store and subtract instructions; ST||ADD, ST||SUB
 Parallel store and multiply instructions; ST||MPY,ST||MAC
 Miscellaneous load –type and store –type instructions; MVKD, MVPD

 Arithmetic Operations:
 Add instructions: ADD, (add to accumulator)ADDC,(add to accumulator with carry)

P.E.S.C.E, Mandya Page 22


Digital Signal Processor and Application

 Subtract instructions; SUB(subtract from ACC), SUBB(subtract from ACC with


borrow)
 Multiply instructions; MPY, MPYA(multiply by accumulator A)
 Multiply-accumulate instructions; MAC, MACD(multiply by program memory and
accumulate with delay)
 Multiply-subtract instructions; MAS, MASA(multiply by ACC A and subtract)
 Double (32-bit operand) instructions; DADD, DSUB
 Application specific instructions; EXP, LMS

 Logical operations:
 AND instructions; AND(logical AND data or a constant with the ACC),
ANDM(logical AND data with the contents of data memory)
 OR instructions; OR(logical OR data or a constant with the ACC), ORM(logical OR
data with the constants of data memory)
 XOR instruction; XOR(logical XOR data or constant with the ACC), XORM(logical
XOR data or constant with the data memory)
 Shift instructions; ROL, SFTL
 Test instructions; BIT(copy the bit under test to the bit TC in the register ST0),
CMPM(complement the MMR)

Program-Control Operations: allow the user to control the program flow


 Branch instructions: B(branch unconditionally ), BACC(branch to address in
accumulator)
 Call instructions: CALL(calls the subroutine), CALA(calls the subroutine at the
location specified by ACC A)
 Interrupt instructions: INTR, TRAP, RESET(software interrupts)
 Return instructions: RET(return to the main program)
 Repeat instructions: RPT(repeat next instruction), RPTB(block repeat up to the label)
 Stack manipulating instructions: PUSHD(Push data memory value on stack),
POPD(Pop top of stack to data memory)
Multiply instruction (MPY)

P.E.S.C.E, Mandya Page 23


Digital Signal Processor and Application

Multiply and Accumulate Instructions(MAC)


 Syntax: MAC Xmem, Ymem, src, dst
 Operands: Xmem, Ymem è dual data memory operands
src, dstè Accumulator A and B
 Execution: (Xmem)×(Ymem)+ (src)èdst
Tè Loaded with Xmem value

P.E.S.C.E, Mandya Page 24


Digital Signal Processor and Application

P.E.S.C.E, Mandya Page 25

You might also like