Embedded Systems Unit 8 Notes
Embedded Systems Unit 8 Notes
Outline
Memory System
Types of memory Caches
Input/Output
Ingo Sander [email protected]
September 4, 2007
Memory System
Most instructions in a RISC processor can execute in a single clock cycle BUT Access to the main memory (typically in SDRAM) is slow If memory access time can be shortened the system would perform considerably better
September 4, 2007
Memory Performance
Memory Bandwidth
rate at which information can be transferred from the memory system
Memory Bandwidth
If R is the number of request that the memory can serve simultaneously then BW = R/L Example:
A 32-bit memory with latency 20 ns has a bandwidth BW = 32 Bit / 20 ns = 1.6 GBit/s = 20 MByte/s
Latency
Time between the following two time instances
time instance where the processor issues a request to the memory time instance where the requested data arrives and is available for use by processor
September 4, 2007 IL2206 Embedded Systems 5
September 4, 2007
Types of memory
ROM (Read Only Memory)
Mask-programmable Flash programmable (can be reprogrammed, but has long access times)
September 4, 2007
September 4, 2007
Synchronous DRAM
Clock signal is used internally to pipeline accesses
Memory must be fast enough to respond to request Request takes multiple clock cycles
Flash issues
Flash is programmed at system voltages Erasure time is long Must be erased in blocks Limited number of erasures
A Flash Memory is very useful in combination with SRAM or SDRAM devices, since it can load these devices at power-on
9 September 4, 2007 IL2206 Embedded Systems 10
September 4, 2007
Caches
Large fast memories are too expensive, but small fast memories are feasible A cache memory is a small, but fast memory that is located near the CPU to reduce memory access times Ideally the processor does only need to access the cache and not the main memory
Memory is a bottleneck
While the CPU is fast, each memory access takes long time and slows down the system Caches can increase the performance, if most memory requests do not need to access the main memory
CPU
(fast)
CPU
(fast)
Memory
(very slow)
Memory Cache
(fast) (very slow)
Bus
(slow)
Bus
(slow)
September 4, 2007
13
September 4, 2007
14
Cache operation
Many main memory locations are mapped onto one cache entry May have caches for:
instructions; data; data + instructions (unified).
CPU data
2000 Wolf (Morgan Kaufman)
September 4, 2007
Terms
Cache hit: required location is in cache. Cache miss: required location is not in cache. Working set: set of locations used by program in a time interval.
Types of misses
Compulsory (cold): location has never been accessed. Capacity: working set is too large. Conflict: multiple locations in working set map to same cache entry.
September 4, 2007
17
September 4, 2007
18
Write operations
Write-through: immediately copy write to main memory
Causes unnecessary memory communication Memory has always a valid copy of the cache block
Write-back: write to main memory only when location is removed from cache
Tries to minimize communication with memory Memory may have an invalid copy of the cache block. Must be updated, when a cache block is replaced
September 4, 2007
19
September 4, 2007
20
Replacement
Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. Two popular strategies:
Random. Least-recently used (LRU).
In case of a modified cache entry in a write-back cache replacement means also to write the contents of the dirty cache entry back to the memory. Thus a cache miss can be expensive!
September 4, 2007 IL2206 Embedded Systems 21
September 4, 2007
22
Cache organizations
Fully-associative: any memory location can be stored anywhere in the cache (almost never implemented). Direct-mapped: each memory location maps onto exactly one cache entry. N-way set-associative: each memory location can go into one of N entries.
CPU
Cache
Main Memory
September 4, 2007
23
September 4, 2007
24
Direct-mapped cache
A direct-mapped cache consists of several cache lines, where each cache line has a status bit, a tag and data (cache block) There is a given mapping for each memory location!
Cache Line 0 1 Cache Block Tag Wd 0 Wd 0 Wd 1 Wd 1 Wd 2 Wd 2 Wd 3 Wd 3 Memory Address 0 10 Block 1 20 Block 2 30 Block 3 7 Status Bit Wd 0 Wd 1 Wd 2 Wd 3 40 Block 4 50 Block 5 60 Block 6 70 Block 7 80 Block 8 FF0
September 4, 2007 IL2206 Embedded Systems
Block 0
Block 1024
25
September 4, 2007
26
Direct-mapped cache
Block 0 Block 1 0x0000 0x0020
1 valid
0xabcd tag
Line 63
Block 127
1 5 32 Data (8 words)
tag
Valid Tag
Block 2047
0xFFE0
27 September 4, 2007 IL2206 Embedded Systems 28
September 4, 2007
Main Memory
Block 0
5 Offset
Block 1
Way 0 Way 0
0 4
1 5
2 6
3 7
Set 31
Way 0
Way 1
Block 127
1 6 32 Data (8 words) Valid Tag
Block 2043
IL2206 Embedded Systems 29 September 4, 2007 IL2206 Embedded Systems 30
September 4, 2007
Set-Associative Caches
One-way set associative (direct-mapped)
Block (Set) 0 1 2 3 4 5 6 7 Tag Tag Tag Tag Tag Tag Tag Tag Data Data Data Data Data Data Data Data
1 2 3
There is a complete freedom, where to place a block in the cache But all blocks have to be searched for the correct tag pattern In order to have an acceptable performance, the tags must be searched in parallel
Data
Example caches
StrongARM:
16 Kbyte, 32-way, 32-byte block instruction cache. 16 Kbyte, 32-way, 32-byte block data cache (write-back).
Nios II
512 Bytes to 64KBytes direct-mapped I- and Dcache with a cache block size of 4 (D), 16(D) or 32(I&D) Bytes
September 4, 2007 IL2206 Embedded Systems
Input/Output
Input/Output Devices are used to communicate with the environment An example is a UART (Universal Asynchronous Receiver/Transmitter) These devices (like other peripheral devices) are controlled by reading and writing to registers
Register Select Data Bus
Control Signals
Output Input
Data Register
I/O Device
September 4, 2007 IL2206 Embedded Systems 36
Serial communication
Characters are transmitted separately
September 4, 2007
37
September 4, 2007
38
Memory-Mapped I/O
Peripheral Components can be connected to the processor by memory-mapped I/O The components can be reached via a separate address space Memory-mapped I/O requires extra hardware for address decoding
Memory-Mapped I/O
The output chip-enable has to be active, when the input of the decoder is a correct address Other address bits are used for register select The decoder can be implemented with a small block of programmable logic or custom hardware (VHDL)
Register Select
Addressbus CPU
Decoder
Interface to Environment
Databus
September 4, 2007 IL2206 Embedded Systems 39 September 4, 2007 IL2206 Embedded Systems 40
0 1 0
Symbolic names can be defined for memory locations #define MEM_LOCATION 0x18 Functions can be defined to access memory
peek can be used to read a memory location (byte) char peek(char *location) {return *location;} poke can be used to write to a memory location (byte) void poke(char *location, char newval) {*location = newval;}
41 September 4, 2007 IL2206 Embedded Systems 42
1
CE
Registers
Dont do this!
The registers can now be accessed in the address space 0x1000 (R0) until 0x1007 (R7) movia r1, 0x1002 movi r3, 0x08 stb r3, (r1) set bit 3 and clears all other bits device register R2
September 4, 2007 IL2206 Embedded Systems
Programmers may make mistakes that the compiler would not do (e.g. memory alignment) HAL (Hardware Abstraction Layer) offers optimized device drivers to access peripheral devices and memory
September 4, 2007
43
September 4, 2007
44
5 BF
0 0x1000 0x1001
5 BF
46
while (current_char != \0) { poke(SendBuf, *current_char++); while ((peek(Status) & 0x20) != 0) ; } /* Mask needed, since other bits */ /* in status register may not be zero */
7 0x1000 0x1001
September 4, 2007 IL2206 Embedded Systems
5 BF
September 4, 2007
48
Interrupt I/O
Busy/wait is very inefficient.
CPU cant do other work while testing device. Hard to do simultaneous I/O.
Interrupt Scheme
Interrupt Request
CPU
Device
September 4, 2007
49
September 4, 2007
50
Interrupt behavior
Based on subroutine call mechanism Interrupt forces next instruction to be a subroutine call to a predetermined location
Return address is saved to resume executing foreground program
September 4, 2007
51
September 4, 2007
52
Programming Interrupt
Foreground Program
Do something Interrupt Event
Interrupt Handler
Save Registers Handle Interrupt Restore Registers Restore PC Clear interrupt disable flag
Interrupt Vector
Branch to Interrupt Handler
end loop; System cannot do anything while it waits for a new character until the sender is ready System resources are utilized very inefficiently!
September 4, 2007
53
September 4, 2007
54
September 4, 2007
56
System designer needs to find the right amount of parallization and the right buffer size!
September 4, 2007 IL2206 Embedded Systems 57
head headtail
September 4, 2007
tail
IL2206 Embedded Systems 58
tail
September 4, 2007
head
IL2206 Embedded Systems 59 September 4, 2007 IL2206 Embedded Systems
c
2000 Wolf (Morgan Kaufman)
60
Prioritized Interrupts
Some CPUs (as Nios II) support several interrupt levels by their hardware Otherwise extra hardware (priority decoder) can be used to create several levels of interrupt
September 4, 2007
61
September 4, 2007
62
Interrupt prioritization
Masking: interrupt with priority lower than current priority is not recognized until pending interrupt is complete. Non-maskable interrupt (NMI): highestpriority, never masked.
Often used for power-down.
2000 Wolf (Morgan Kaufman)
:foreground
:A
:B
:C
September 4, 2007
63
September 4, 2007
64
Summary
Peripherals can be made accessible for software by memory mapped I/O Two basic approaches for communication with I/O device
polling processor checks, if data has arrived interrupt processor is notified, if data has arrived