Lecture7 Embedded Software
Lecture7 Embedded Software
CPU: Week
ARM Cortex-M 4
Curriculum Memory
and Interfaces
Week
5-6
Real-time Week
Operating systems 10-12
Project Week
Laboratory for Smart Integrated Systems 15
Objectives
In this lecture you will be introduced to:
– Definition of the embedded software
– Some useful structures for embedded software
development.
– Models of programs, such as data flow and
control flow graphs.
– An introduction to compilation methods.
– Analyzing and optimizing programs for
performance, size, and power consumption.
3
Outline
• Embedded Software
• Embedded Software components
• Representations of programs
• Assembly and linking
• Compilation flow
• Summary
4
Embedded Software
Definition
“Embedded software is computer software, written to control
machines or devices that are not typically thought of as
computers. It is typically specialized for the particular hardware
that it runs on and has time and memory constraints. This term
is sometimes used interchangeably with firmware.”
6
Embedded Software Overview
How does a firmware execute:
Reset interrupt vector
7
Embedded Software Overview
How does a firmware execute:
Int I = 0;
8
Embedded Software Overview
How does a firmware execute:
Int x = 8
9
Embedded Software Overview
How does a firmware execute:
Main() is typically
implemented as a
endless loop which
is either interrupt-
driven or uses
polling for
controlling external
interaction or
internal events!
10
Development Environment
• Host: a computer running programming tools
for development
• Target: the HW on which code will run
• After program is written, compiled, assembled
and linked, it is transferred to the target
X86 MSP430
11
SW Development Environment
Editor KeilTM uVision®
Simulated Processor
Source code Start Microcontroller
Start
; direction register Debug
LDR R1,=GPIO_PORTD_DIR_R Session Memory
LDR R0,[R1]
ORR R0,R0,#0x0F
; make PD3-0 output I/O
STR R0, [R1]
12
What If Real HW Not Available?
• Development board:
– Before real hardware is built, software can be developed
and tested using development boards
– Development boards usually have the same CPU as the
end product and provide many IO peripherals for the
developed software to use
as if it were running on the
real end product
• Tools for program development
– Integrated Development Environment
(IDE): cross compiler, linker, loader, …
– OS and related libraries and packages
13
Cross Compiler
• Runs on host but generates code for target
– Target usually have different architecture from host. Hence
compiler on host has to produce binary instructions that
will be understood by target
14
Outline
• Embedded Software
• Embedded Software components
• Representations of programs
• Assembly and linking
• Compilation flow
• Summary
15
Software state machine
16
State machine example
Seat belt controller:
- Two inputs: seat sensor and belt sensor
- One Timer:
- One Output: Buzzer
no seat/-
no seat/ idle
buzzer off seat/timer on
no seat/- no belt
No Belt and Timer over and no
buzzer /buzzer on seated
timer/-
belt/timer off
belt/
buzzer off no belt/timer on
belted
17
State machine example
#define IDLE 0
#define SEATED 1
#define BELTED 2
#define BUZZER 3
Switch (state) { /* check the current state */
case IDLE:
if (seat){ state = SEATED; timer_on = TRUE; }
/* default case is self-loop */
break;
case SEATED:
if (belt) state = BELTED; /* won’t hear the buzzer */
else if (timer_over) state = BUZZER; /* didn’t put on belt in time */
/* default case is self-loop */
break;
case BELTED:
if (!seat) state = IDLE; /* person left */
else if (!belt) state = SEATED; /* person still in seat */
break;
case BUZZER:
if (belt) state = BELTED; /* belt is on---turn off buzzer */
else if (!seat) state = IDLE; /* no one in seat--turn off buzzer */
break;
18
}
Project
Seat belt controller • Hardware Requirement:
– 2 user LEDs
– 2 user push buttons
– Timer
– Option: 4-digit 7-segment
LCD module
19
Project
Seat belt controller
• Description: The controller turns on a buzzer if a person sits in
a seat and does not fasten the seat belt within a fixed amount
of time.
• This system has two inputs and two outputs.
– SW1 represents the sensor that detects when a person has sat down,
– SW2 represents the seat belt sensor that tells when the belt is fastened,
– Red LED is the output that represents the buzzer.
– Blue LED is the output that represents the belted state.
– LCD (option): displays current state of the system
• A timer is for setting the required time interval before turn on
buzzer
20
Signal processing and circular buffer
• Commonly used in signal processing:
– new data constantly arrives and must be processed in
real time;
– each datum has a limited lifetime.
time time t+1
d1 d2 d3 d4 d5 d6 d7
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
t1 t2 t3 t4 t5 t6
Data stream
x1
x5
x9 x2
x6 x3
x7 x4
x8
Circular buffer
22
Circular buffers
• Indexes locate currently used data, current input data:
input d1 use d5
d2 input d2
d3 d3
use d4 d4
23
Queues
• Are used whenever data may arrive and depart at somewhat unpredictable times
• Implementation:
– Declare the array
– The variables head and tail keep track of the two ends of the queue
– Two error conditions: Reading from an empty queue and writing to a full queue.
Head Head d1 d5
Tail
Tail
d2
d3 Head d3
Tail d4
25
Models of programs
26
Data flow graph
27
Single assignment form
x = a + b; x = a + b;
y = c - d; y = c - d;
z = x * y; z = x * y;
y = b + d; y1 = b + d;
28
Data flow graph
x = a + b;
a b c d
y = c - d;
z = x * y; + -
y1 = b + d;
x y
29
DFGs and partial orders
- DFG shows the order in
which the operations are
a b c d performed in the program
+ -
- Partial order:
[a+b, c-d]; [b+d, x*y]
y
x Can do pairs of operations
in any order.
* +
- DFG help us to determine
feasible reorderings of the
z y1 operations, so to reduce
pipeline or cache conflicts
30
Control-data flow graph
• CDFG: model both data operations
(arithmetic and other computations) and
control operations (conditionals).
• Uses data flow graphs as components, adding
constructs to describe control.
• Two types of nodes:
– Decision node: describe all the types of control in
a sequential program;
– Data flow node: represent a basic block.
31
Data flow node
x = a + b;
y=c+d
32
Control Nodes
T v1 v4
cond value
v2 v3
F
Equivalent forms
33
CDFG example
T
if (cond1) bb1(); cond1 bb1()
else bb2(); F
bb3(); bb2()
switch (test1) {
case c1: bb4(); break; bb3()
case c2: bb5(); break;
case c3: bb6(); break; c3
c1 test1
}
c2
bb4() bb5() bb6()
34
for loop
T
i=0;
while (i<N) { loop_body();
i++;
loop_body(); i++; }
equivalent
End
35
Outline
• Embedded Software
• Software components
• Representations of programs
• Assembly and linking
• Compilation flow
• Summary
36
Assembly and linking
Describes the instructions and
• Last steps in compilation: data in binary format.
37
Multiple-module programs
38
Assemblers
• Major tasks:
– generate binary representation for symbolic
assembly instructions;
– translate labels into addresses;
– handle pseudo-ops (data, etc.).
• Generally one-to-one translation.
• Assembly labels: represent locations of
instructions and data
ORG 100 ; pseudo-ops
label1 ADR r4,c ; 39
Two-pass assembly
• Pass 1:
– generate symbol table:
• determining the address of each label
• Pass 2:
– generate binary instructions
• using the label values computed in the first pass
40
Symbol table
ORG 0x7
ADD r0,r1,r2 xx 0xB
xx ADD r3,r4,r5 yy 0x13
CMP r0,r3
yy SUB r5,r6,r7
First Pass
41
Symbol table generation
42
Symbol table example
ORG 0x7
PLC=0x7 ADD r0,r1,r2 xx 0xB
xx PLC=0xB ADD r3,r4,r5 yy 0x13
PLC=0xF CMP r0,r3
yy PLC=0x13 SUB r5,r6,r7
43
Relative address generation
44
Linking
• Combines several object modules into a
single executable module.
– An assembly language program are usually
written as several smaller pieces + uses library
routines rather than as a single large file.
• Jobs:
– put modules in order;
– resolve labels across modules.
45
External References and entry points
entry point
xxx ADD r1,r2,r3 a ADR r4,yyy
B a external reference ADD r3,r4,r5
yyy %1
46
Module ordering
• Code modules must be placed in absolute positions in the
memory space.
• Load map or linker flags control the order of modules.
– Some data structures and instructions must be put at precise
memory locations
– Many different types of memory may be installed at different
address ranges.
module1
module2
module3
47
Dynamic linking
48
Reentrancy
49
Outline
• Embedded Software
• Software components.
• Representations of programs.
• Assembly and linking.
• Compilation flow.
• Summary
50
Compilation
• Why need to understand Compilation process
– how a high-level language program is translated
into instructions: interrupt handling instructions,
placement of data and instructions in memory, etc
– how code is generated can help you meet your
performance goals:
• either by writing high-level code that gets compiled into
the instructions you want
• or by recognizing when you must write your own
assembly code.
51
Compilation
• Compilation process:
– compilation = translation + optimization
• The high-level language program is translated into the
lower-level form of instructions;
• optimizations try to generate better instruction
sequences
• Compiler determines quality of code:
– use of CPU resources;
– memory access scheduling;
– code size.
52
Basic compilation phases
Machine-independent
optimizations
Machine-dependent
optimizations
Assembly Code
53
Statement translation and optimization
54
Arithmetic expressions
X= a*b + 5*(c-d) b
a c d
* -
expression
5
DFG
55
Arithmetic expressions, cont’d.
ADR r4,a
MOV r1,[r4]
a b c d ADR r4,b
MOV r2,[r4]
1 * 2 - MUL r3,r1,r2
5
ADR r4,c
MOV r1,[r4]
3 *
ADR r4,d
MOV r5,[r4]
SUB r6,r4,r5
4 + MUL r7,r6,#5
ADD r8,r7,r3
ADR r1,x
STR r8,[r1]
DFG Assembly code
56
Compiled code for arithmetic expressions
if (a+b > 0) T
x = 5; a+b>0 x=5
else F
x = 7; x=7
expression
DFG
58
Control code generation, cont’d.
ADR r5,a
LDR r1,[r5]
ADR r5,b
T LDR r2,[r5]
1 a+b>0 x=5 2 ADD r3,r1,r2
CMP R3, #0
F
BLE label3
LDR r3,#5
3 x=7 ADR r5,x
STR r3,[r5]
B stmtent
Label3
stment LDR r3,#7
ADR r5,x
STR r3,[r5]
stmtent ...
59
Compiled code for control
61
Procedure stacks
high address
growth
proc1 proc1(int a) {
proc2(5);
FP }
frame pointer
proc2
5 accessed relative to SP
SP
stack pointer
62
ARM procedure linkage
63
ARM procedure linkage
• APCS (ARM Procedure Call Standard):
64
Compiled procedure call code
ldr r3, [fp, #-32] ; get e
str r3, [sp, #0] ; put into p1()’s stack frame
ldr r0, [fp, #-16] ; put a into r0
ldr r1, [fp, #-20] ; put b into r1
ldr r2, [fp, #-24] ; put c into r2
ldr r3, [fp, #-28] ; put d into r3
bl p1 ; call p1()
mov r3, r0 ; move return value into r3
str r3, [fp, #-36] ; store into y in stack frame
65
Data structure transformations
66
One-dimensional arrays
a a[0]
a[1] = *(a + 1) pointer
a[2]
67
Two-dimensional arrays
• Column-major layout:
a[0,0]
a[0,1] M
...
N
... a[1,0]
a[1,1] = a[i*M+j]
68
Structures
• A structure is implemented as a contiguous block of memory
• Fields within structures are static offsets:
– Fields in the structure can be accessed using constant offsets to the
base address of the structure
aptr
struct { field1 4 bytes
int field1;
char field2; *(aptr+4)
} mystruct; field2
69
Compiler Optimizations
70
Expression simplification
• Constant folding:
– N+1 = 8+1 = 9 (N has ben declared as a constant)
• Algebraic:
– a*b + a*c = a*(b+c) (Why?)
• Strength reduction:
– a*2 = a<<1
71
Dead code elimination
• Dead code:
#define DEBUG 0
if (DEBUG) dbg(p1); 0
0
• Can be eliminated by
analysis of control flow, 1
constant folding. dbg(p1);
72
Procedure inlining
73
Register allocation
• Goals:
– choose register to hold each variable;
– determine lifespan of variable in the register.
• Basic case: within basic block.
74
Register lifetime graph
w = a + b; t=1
x = c + w; t=2 a
b
y = c + d; t=3
c
d
w
x
y
1 2 3 time
75
QUIZ
w = a + b; /* statement 1 */
x = c + d; /* statement 2 */
y = x + w; /* statement 3 */
z = a − b; /* statement 4 */
76
QUIZ
w = a + b; /* statement 1 */ w = a + b; /* statement 1 */
x = c + d; /* statement 2 */ z = a − b; /* statement 4 */
y = x + w; /* statement 3 */ x = c + d; /* statement 2 */
z = a − b; /* statement 4 */ y = x + w; /* statement 3 */
77
QUIZ
w = a + b; /* statement 1 */
z = a − b; /* statement 4 */
x = c + d; /* statement 2 */
y = x + w; /* statement 3 */
78
Instruction scheduling
• When a instruction is executed and which
resources does it use?
79
Reservation table
• A reservation table
Time/instr A B
relates instructions
execution time slots to instr1 X
CPU resources. instr2 X X
instr3 X
instr4 X
80
Instruction selection
+ +
* +
* MUL ADD *
expression templates MADD
81
Summary
82
Quiz
Q1: State machine example, circular buffer, Queues?
Q2: For each basic block given below, rewrite it in single-assignment
form, and then draw the data flow graph for that form.
a). x = a + b; b). r = a + b − c;
y = c + d; s = 2 * r;
z = x + e; t = b − d;
c). a = q − r; r = d + e;
b = a + t; d). w = a − b + c;
a = r + s; x = w − d;
c = t − u; y = x − 2;
w = a + b − c;
z = y + d;
y = b * c;
83
Quiz
Q3: Draw the CDFG for the following code fragments:
a).
if (y == 2) {r = a + b; s = c − d;}
else r = a − c d).
b). for (i = 0; i < N; i++)
x = 1; x[i] = a[i]*b[i];
if (y == 2) { r = a + b; s = c − d; } e).
else { r = a − c; } for (i = 0; i < N; i++) {
c). if (a[i] == 0)
x = 2; x[i] = 5;
while (x < 40) { else
x = foo[x]; x[i] = a[i]*b[i];
} }
84
Quiz
Q5: Show the contents of the assembler’s symbol table at the end of code
generation for each line of the following programs:
a.
ORG 200 b.
p1: ADR r4,a ORG 100
LDR r0,[r4] p1: CMP r0,r1
ADR r4,e BEQ x1
LDR r1,[r4] p2: CMP r0,r2
ADD r0,r0,r1 BEQ x2
CMP r0,r1 p3: CMP r0,r3
BNE q1 BEQ x3
p2: ADR r4,e
c.
ORG 200
S1: ADR r2,a
LDR r0,[r2]
S2: ADR r2,b
LDR r2,a
ADD r1,r1,r2
85
Quiz
Q6: Draw the CDFG for the following C code before and after applying
dead code elimination to the if statement:
#define DEBUG 0
proc1();
if (DEBUG) debug_stuff();
switch (foo) {
case A: a_case();
case B: b_case();
default: default_case();
}
Q7: Unroll the loop below:
for (i = 0; i < 32; i++)
x[i] = a[i] * c[i];
a. two times
b. three times
86
Quiz
Q8: Apply loop fusion or loop distribution to these code fragments as
appropriate. Identify the technique you use and write the modified code.
a.
for (i=0; i<N; i++) c.
z[i] = a[i] + b[i]; for (i=0; i<N; i++) {
for (i=0; i<N; i++) for (j=0; j<M; j++) {
w[i] = a[i] − b[i]; c[i][j] = a[i][j] + b[i][j];
b. x[j] = x[j] * c[i][j];
for (i=0; i<N; i++) { }
x[i] = c[i]*d[i]; y[i] = a[i] + x[j];
y[i] = x[i] * e[i]; }
}
Q9: What is software pipelining? Give a example about software pipelining.
87