Mylecture TMS320C5x Architecture
Mylecture TMS320C5x Architecture
2/21/2014
What is Word?? A word is basically a fixed sized group of bits that are handled as a unit by the instruction set and/or hardware of the processor. The majority of the registers in a processor are usually word sized and the largest piece of data that can be transferred to and from the working memory in a single operation is a word in many (not all) architectures.
2/21/2014 Abhishek Kumar Srivastava 2
2/21/2014
The following characteristics make this family the ideal choice for a wide range of processing applications: i. Very flexible instruction set ii. Inherent operational flexibility iii. High-speed performance iv. Innovative, parallel architectural design v. Cost-effectiveness
2/21/2014
TMS320C5x Overview
The C5x generation consists of the C50, C51, C52, C53, C53S, C56, C57, and C57S DSPs, which are fabricated by CMOS integrated-circuit technology. The operational flexibility and speed of the C5x are the result of combining an advanced Harvard architecture (which has separate buses for program memory and data memory), a CPU with application-specific hardware logic, on-chip peripherals, on-chip memory, and a highly specialized instruction set. The C5x is designed to execute up to 50 million instructions per second (MIPS).
2/21/2014 Abhishek Kumar Srivastava 5
2/21/2014
2/21/2014
Bus Structure
Separate program and data buses allow simultaneous access to program instructions and data, providing a high degree of parallelism. For example, while data is multiplied (multiplier), a previous product can be loaded into, added to, or subtracted from the accumulator and, at the same time, a new address can be generated (program bus). Such parallelism supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed in a single machine cycle. In addition, the C5x includes the control mechanisms to manage interrupts, repeated operations, and function calling.
2/21/2014 Abhishek Kumar Srivastava 9
The C5x architecture is built around four major buses: 1. Program bus (PB) 2. Program address bus (PAB) 3. Data read bus (DB) 4. Data read address bus (DAB)
2/21/2014
10
The PAB provides addresses of program memory space for both reads and writes. The PB also carries the instruction code and immediate operands from program memory space to the CPU. The DB interconnects various elements of the CPU to data memory space. It is an internal bus that carries data from data memory to the central arithmetic logic unit (CALU) and the auxiliary register arithmetic unit (ARAU). DAB: Bus that provides the data address used by CPU.
2/21/2014
11
In simple words, PABprovide program address of program memory to CPU PBprovide instructions and operands from program memory to CPU DABprovide data address from data memory to CPU DBprovide data from data memory to CPU The program and data buses can work together to transfer data from on-chip data memory and internal or external program memory to the multiplier for single-cycle multiply/accumulate operations.
2/21/2014
12
2/21/2014
13
The C5x CPU maintains source-code compatibility with the C1x and C2x generations while achieving high performance and greater versatility. Improvements include a 32-bit accumulator buffer (A register that temporarily stores the contents of the accumulator (ACC)), additional scaling capabilities, and a host of new instructions. Data management has been improved through the use of new block move instructions (16 bit register use to move block of data) and memory-mapped register instructions.
2/21/2014
14
2/21/2014
16
2/21/2014
17
Auxiliary register compare register (ARCR): The 16-bit ARCR is used for address boundary comparison. The CMPR instruction compares the ARCR to the selected AR and places the result of the compare in the TC bit of ST1 (TC=0 if result false else TC=1). For indirect data memory addressing, the address of the desired memory location is placed into the selected auxiliary register. These registers are referenced with a 3-bit auxiliary register pointer (ARP) that is loaded with a value from 0 through 7, designating AR0 through AR7, respectively. The contents of these registers (AR & ARP) can also be stored in data memory or used as inputs to the CALU.
2/21/2014 Abhishek Kumar Srivastava 18
The auxiliary registers and the ARP can be loaded from data memory, the ACC, the product register, or by an immediate operand (immediate data) defined in the instruction. The auxiliary register file (AR0-AR7) is connected to ARAU. The ARAU can auto index (auto increment or decrement) the current AR while the data memory location is being addressed and can index either by 1 or by the contents of the INDX. As a result, accessing data does not require the CALU for address manipulation; therefore, the CALU is free for other operations in parallel.
2/21/2014
19
Program Controller
The program controller contains logic circuitry that decodes the operational instructions, manages the CPU pipeline, stores the status of CPU operations, and decodes the conditional operations.
Parallelism of architecture lets the C5x perform three concurrent memory operations (using DARAM not SARAM) in any given machine cycle: fetch an instruction, read an operand, and write an operand.
2/21/2014
21
The program controller consists of these elements: 1. 16 bit Program counter (PC) 2. 16 bit Status registers ST0, ST1, processor mode status register (PMST) and circular buffer control register (CBCR) 3. (8 X 16)-bit Hardware stack 4. Address generation logic 5. Instruction register 6. Interrupt flag register and interrupt mask register
2/21/2014
22
PMST: A MMR that contains status and control bits. The PMST resides in the memory-mapped register space of data memory page 0. The PMST can be acted upon directly by the CALU and the PLU. The PMST has an associated 1-level deep shadow register stack for automatic context-saving when an interrupt trap is taken.
2/21/2014
23
Bit 1511
Name IPTR
Function Interrupt vector pointer bits. These bits select any of 32 2K-word pages where the interrupt vectors reside. Address visibility bit. This bit enables/disables the internal program address to be visible at the address pins.
AVIS
OVLY
This bit enables/disables the on-chip single-access RAM (SARAM) to be addressable in data memory space
This bit enables/disables the on-chip single-access RAM (SARAM) to be addressable in program memory space.
RAM
MP/MC
This bit enables/disables the on-chip ROM to be addressable in program memory space.
0 NDX=0 Any C2x-compatible instruction that modifies or writes AR0 also modifies or writes the INDX and ARCR NDX=1 reverse of previous statement
NDX
TRM
TRM=0 Any C2x-compatible instruction that loads TREG0 also loads TREG1 and TREG2 TRM=1 reverse of previous statement
Block repeat active flag bit. This bit indicates that a block repeat is currently active/de-active.
Abhishek Kumar Srivastava 24
BRAF
2/21/2014
CBCR: A MMR that enables/disables the circular buffers (CENB1 and CENB2 bits) and defines which auxiliary registers (AR selected by CAR1 for circular buffer 1 and CAR2 for circular buffer 2) are mapped to the circular buffers.
Hardware Stack: The stack which is 16 bits wide and 8 levels deep, is accessible via the PUSH and POP instructions.
2/21/2014
25
Whenever the contents of the PC are pushed onto the top of the stack (TOS), the previous contents of each level are pushed down, and bottom (eighth) location of the stack is lost. Therefore, data is lost if more than eight successive pushes occur before a pop. Address generation logic (Program & data): Logic circuitry that generates the addresses for program memory/data memory reads and writes Can generate one address per machine cycle.
2/21/2014
26
Interrupt flag register (IFR): A memory-mapped register that indicates pending interrupts. Interrupt mask register (IMR): A memory-mapped register used to mask external and internal interrupts. Writing a 1 to any IMR bit position enables the corresponding interrupt (when INTM (interrupt mode bit) = 0).
2/21/2014
27
Significance of various bits of ST0 and ST1 are: a) ARP (Auxiliary Register Pointer) These bits selects the AR to be used in indirect addressing. When the ARP is loaded, the previous ARP value is copied to the auxiliary register buffer (ARB) in ST1. b) OVM (Overflow mode) bit enables/disables accumulator overflow saturation mode in ALU. OVM=0; An overflowed result is loaded into the accumulator without modification. OVM=1; An overflowed result is loaded into the accumulator with either the most positive (00 7FFF FFFFh) or the most negative value (FF 8000 0000h).
2/21/2014 Abhishek Kumar Srivastava 29
d) OV (overflow) flag bit indicates arithmetic operation overflow in the ALU d) INTM (Interrupt Mode) bit globally masks or enables all interrupts. The INTM bit has no effect on the non-maskable RS and NMI interrupts. e) DP (Data memory page pointer) bits specify the address of current data memory page. DP bits (0-8 bits) are concatenated with the 7 LSBs of an instruction word to form a direct memory address of 16 bits (like 8051 2k page access of ROM).
2/21/2014 Abhishek Kumar Srivastava 30
12 CNF
11 TC
10 SXM
9 C
8-7 11
6 HM
5 1
4 XF
3-2 11
1-0 PM
31
CNF on-chip RAM configuration control bit This 1 bit field enables the on-chip dual access RAM block 0 (DARAM B0) to be addressable in data memory space or program memory space. CNF (configuration control bit) can be modified by LST #1 instruction. DARAM: Memory space that can be read from and written to in the same clock cycle. When CNF=0, on chip DRAM block 0 is mapped into data memory space (CLRC CNF). When CNF=1, on chip DRAM block 0 is mapped into program memory space (SETC CNF).
2/21/2014 Abhishek Kumar Srivastava 32
TC Test/control flag bit 1 bit flag stores the result of ALU or parallel logic unit (PLU) test bit operations. The status of the TC determines if the condition branch, call and return instructions are to be executed.
SXM sign-extension mode bit 1-bit field enables/ disables sign extension of an arithmetic operation. The SXM bit affects the definition of the shift accumulator right (SFR) instruction. When SXM = 1, SFR performs an arithmetic right shift, maintaining the sign of the ACC data. When SXM = 0, SFR performs a logical shift, shifting out the LSBs and shifting in a 0 for the MSB 2/21/2014 Abhishek Kumar Srivastava 33
C Carry bit 1-bit field indicates an arithmetic operation carry or borrow in the ALU. Shift or rotate affect the C bit. HM Hold mode bit 1 bit field determines whether the CPU stops or continue execution when acknowledging an active HOLD signal (from DMA). XF pin status bit 1 bit determines the level of the external flag (XF) output pin. The XF pin signals to external devices via software. XF is set high at device reset.
2/21/2014
34
PM product shift mode bits 2 bit field determines the product shifter (P-SCALER) mode and shift value for the PREG (product register) output into the ALU.
PM bits B1 0 0 1 1 B0 0 1 0 1 Function P-SCALER mode for PREG output No shift Left-shifted 1 bit; LSB zero filled Left-shifted 4 bits; 4 LSB zero filled Right-shifted 6 bits; 6 LSB lost. Product always sign extended
2/21/2014
35
On-Chip Memory
The C5x architecture contains a considerable amount of onchip memory to aid in system performance and integration: 1. Program read-only memory (ROM) 2. Data/program dual-access RAM (DARAM) 3. Data/program single-access RAM (SARAM) The C5x has a total address range of 224K words x 16 bits. The memory space is divided into four individually selectable memory segments: 1. 64K-word program memory space, 2. 64K-word local data memory space, 3. 64K-word input/ output ports, and 4. 32K-word global data memory space.
2/21/2014 Abhishek Kumar Srivastava 36
Program ROM
All C5x DSPs carry a 16-bit on-chip maskable programmable ROM. The C50 and C57S DSPs have boot loader code (main function of the boot loader is to transfer code from an external source to the program memory at power-up) resident in the on-chip ROM, all other C5x DSPs offer the boot loader code as an option. This memory is used for booting program code from slower external ROM or EPROM to fast on-chip or external RAM. Once the custom program has been booted into RAM, the boot ROM space can be removed from program memory space by setting the MP/MC bit in the (PMST) processor mode status register. 2/21/2014 Abhishek Kumar Srivastava 37
The on-chip ROM is selected at reset by driving the MP/MC pin low. If the on-chip ROM is not selected, the C5x devices start execution from off-chip memory. The on-chip ROM may be configured with or without boot loader code. However, the on-chip ROM is intended for your specific program. Once the program is in its final form, you can submit the ROM code to Texas Instruments for implementation into your device.
2/21/2014 Abhishek Kumar Srivastava 38
The microprocessor/microcomputer (MP/MC) mode is available on all ROM-coded TMS320 DSP devices when accesses to either on-chip or off-chip memory are required. The microprocessor mode is used to develop, test, and refine a system application. In this mode of operation, the TMS320 acts as a standard microprocessor by using external program memory. When the algorithm has been finalized, the code can be submitted to Texas Instruments for masking into the on-chip program ROM. At that time, the TMS320 becomes a microcomputer that executes customized programs from the on-chip ROM. Should the code need changing or upgrading, the TMS320 can once again be used in the microprocessor mode.
2/21/2014 Abhishek Kumar Srivastava 39
DARAM improves the operational speed of the C5x CPU. The CPU operates with a 4-deep pipeline (Fetch, Decode, Execute, Write Back). In this pipeline, the CPU reads data on the third stage and writes data on the fourth stage. Hence, for a given instruction sequence, the second instruction could be reading data at the same time the first instruction is writing data. The dual data buses (DB and DAB) allow the CPU to read from and write to DARAM in the same machine cycle.
2/21/2014
41
2/21/2014
42
The SARAM is divided into 1K- and/or 2K-word blocks contiguous in address memory space. All C5x CPUs support parallel accesses to these SARAM blocks. However, one SARAM block can be accessed only once per machine cycle. In other words, the CPU can read from or write to one SARAM block while accessing another SARAM block.
2/21/2014
43
SARAM supports more flexible address mapping than DARAM because SARAM can be mapped to both program and data memory space simultaneously. However, because of simultaneous program and data mapping, an instruction fetch and data fetch that could be performed in one machine cycle with DARAM (using PB and DB) may take two machine cycles with SARAM.
2/21/2014
44
On-Chip Peripherals
All C5x DSPs have the same CPU structure; however, they have different on-chip peripherals connected to their CPUs. The C5x DSP on-chip peripherals available are: Clock generator Hardware timer Software-programmable wait-state generators Parallel I/O ports Host port interface (HPI) Serial port Buffered serial port (BSP) Time-division multiplexed (TDM) serial port User-maskable interrupts
2/21/2014 Abhishek Kumar Srivastava 45
Clock Generator
The clock generator consists of an internal oscillator and a phase-locked loop (PLL) circuit. The clock generator can be driven internally by a crystal resonator circuit or driven externally by a clock source. The PLL circuit can generate an internal CPU clock by multiplying the clock source by a specific factor, so you can use a clock source with a lower frequency than that of the CPU. The PLL has a maximum operating frequency of 28.6 MHz.
2/21/2014
46
PLL
A phase-locked loop or phase lock loop (PLL) is a control system that generates an output signal whose phase is related to the phase of an input "reference" signal. It is an electronic circuit consisting of a variable frequency oscillator and a phase detector. This circuit compares the phase of the input signal with the phase of the signal derived from its output oscillator and adjusts the frequency of its oscillator to keep the phases matched. The signal from the phase detector is used to control the oscillator in a feedback loop.
2/21/2014
47
Consequently, a phase-locked loop can track an input frequency, or it can generate a frequency that is a multiple of the input frequency. The former property is used for demodulation, and the latter property is used for indirect frequency synthesis. Phase-locked loops are widely employed in radio, telecommunications, computers and other electronic applications.
2/21/2014
48
2/21/2014
49
Hardware Timer
A 16-bit hardware timer with a 4-bit pre-scaler is available. This programmable timer clocks at a rate that is between 1/2 and 1/32 of the machine cycle rate (CLKOUT1), depending upon the timers divide-down ratio (in 8051, timers frequency is 1/12 of crystal freq.). CLKOUT1: Master clock output signal. This signal cycles at the machine-cycle rate of the CPU. The frequency of CLKOUT1 is one-half the crystal oscillating frequency or CLKIN rate (2040 MHz). Some times, CLKOUT rate=CLKIN rate. The internal machine cycle is bounded by the rising edges of this signal. The timer can be stopped, restarted, reset, or disabled by specific status bits.
2/21/2014 Abhishek Kumar Srivastava 50
The timer is driven by a pre-scaler which is decremented by 1 at every CLKOUT1 cycle. A timer interrupt (TINT) is generated each time the counter decrements to 0. The timer provides a convenient means of performing periodic I/O or other functions. When the timer is stopped (TSS = 1), the internal clocks to the timer are shut off, allowing the circuit to run in a low-power mode of operation.
2/21/2014 Abhishek Kumar Srivastava 51
2/21/2014
52
2/21/2014
53
2/21/2014
54
2/21/2014
55
2/21/2014
56
2/21/2014
57
Serial Port
Three different kinds of serial ports are available: a general-purpose serial port, a time-division multiplexed (TDM) serial port, and a buffered serial port (BSP). Each C5x contains at least one general-purpose, highspeed synchronous, full-duplexed serial port interface that provides direct communication with serial devices such as codecs, serial analog-to-digital (A/D) converters, and other serial systems. The serial port is capable of operating at up to one fourth the machine cycle rate (CLKOUT1). The serial port transmitter and receiver are doublebuffered and individually controlled by maskable external interrupt signals. Data is framed either as bytes or as words.
2/21/2014 Abhishek Kumar Srivastava 58
2/21/2014
59
2/21/2014
60
User-Maskable Interrupts
Four external interrupt lines (INT1INT4) and five internal interrupts, a timer interrupt and four serial port interrupts, are user maskable. When an interrupt service routine (ISR) is executed, the contents of the program counter are saved on an 8-level hardware stack, and the contents of eleven specific CPU registers are automatically saved (shadowed) on a 1-level-deep stack (shadow registers). When a return from interrupt instruction is executed, the CPU registers contents are restored.
2/21/2014
61
Test/Emulation
On the C50, LC50, C51, LC51, C53, LC53, C57S and LC57S, an IEEE standard 1149.1 (JTAG) interface with boundary scan capability is used for emulation and test. This logic provides the boundary scan to and from the interfacing devices. It can be used to test pin-to-pin continuity and to perform operational tests on devices that are peripheral to the C5x.
2/21/2014
62
2/21/2014
63