Chapter 01 See Program Running
Chapter 01 See Program Running
Chapter 1
Computer and Assembly Language
Modified by Dr. Jonathan Phillips for USU ECE 3710, Fall 2018
1
Embedded Systems
2
Amazon Warehouse
Kiva Robot
3
Why do we learn Assembly?
Assembly isn’t “just another language”.
Help you understand how does the processor work
Assembly program runs faster than high-level language. Performance critical codes must be written in
assembly.
Use the profiling tools to find the performance bottle and rewrite that code section in assembly
Latency-sensitive applications, such as aircraft controller
Standard C compilers do not use some operations available on ARM processors, such ROR (Rotate Right) and
RRX (Rotate Right Extended).
Hardware/processor specific code,
Processor booting code
Device drivers
A test-and-set atomic assembly instruction can be used to implement locks and semaphores.
Cost-sensitive applications
Embedded devices, where the size of code is limited, wash machine controller, automobile controllers
The best applications are written by those who've mastered assembly language or fully understand the
low-level implementation of the high-level language statements they're choosing.
4
Why ARM processor
As of 2005, 98% of the more than one billion mobile phones
sold each year used ARM processors
5
iPhone 7
Teardown
A10 processor:
• 64-bit system on chip (SoC)
• ARMv8-A core
6
Apple Watch
Apple S1 Processor
32-bit ARMv7-A compatible
# of Cores: 1
CMOS Technology: 28 nm
L1 cache 32 KB data
L2 cache 256 KB
GPU PowerVR SGX543
7
Nest Learning Thermostat
source: ifixit.com
0x00000000
9 Memory
Computer Architecture
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.
10
Computer Architecture
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.
11
ARM Cortex-M Series Family
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.
12
Levels of Program Code
C Program Assembly Program Machine Program
0010000100000000
int main(void){ 0010000000000000
int i; 1110000000000001
int total = 0; Compile Assemble 0100010000000001
for (i = 0; i < 10; i++) {
0001110001000000
total += i;
} 0010100000001010
while(1); // Dead loop 1101110011111011
} 1011111100000000
1110011111111110
13
See a Program Runs
C Code
Assembly Code
int main(void){
int a = 0; MOVS r1, #0x00 ; int a = 0
int b = 1; compiler MOVS r2, #0x01 ; int b = 1
int c; ADDS r3, r1, r2 ;c = a + b
c = a + b; MOVS r0, 0x00 ; set return value
return 0; BX lr ; return
}
Machine Code
0010000100000000 2100 ; MOVS r1, #0x00
0010001000000001 2201 ; MOVS r2, #0x01
0001100010001011 188B ; ADDS r3, r1, r2
0010000000000000 2000 ; MOVS r0, #0x00
0100011101110000 4770 ; BX lr
In Binary In Hex
14
Processor Registers
32 bits
Fastest way to read and write
Registers are within the processor chip
R0 A register stores 32-bit value
R1
STM32L has
R2
R0-R12: 13 general-purpose registers
Low R3
Registers
R4 R13: Stack pointer (Shadow of MSP or PSP)
R5 R14: Link register (LR)
General
R6 Purpose R15: Program counter (PC)
Register
R7 Special registers (xPSR, BASEPRI, PRIMASK, etc)
R8
R9
High
32 bits
Registers R10
R11 xPSR
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
R14 (LR) FAULTMASK
R15 (PC) CONTROL
15
Program Execution
Program Counter (PC) is a register that holds the memory
address of the next instruction to be fetched from the memory.
Memory Address
1. Fetch
instruction at
PC address 4770 0x080001B4
2000 0x080001B2
PC 188B 0x080001B0
2201 0x080001AE
3. Execute 2. Decode 2100 0x080001AC
the the
instruction instruction PC = 0x080001B0
Instruction = 188B or
2000188B or 8B180020
16
Three-state pipeline:
Fetch, Decode, Execution
Pipelining allows hardware resources to be fully utilized
One 32-bit instruction or two 16-bit instructions can be fetched.
17
Three-state pipeline:
Fetch, Decode, Execution
Pipelining allows hardware resources to be fully utilized
One 32-bit instruction or two 16-bit instructions can be fetched.
Clock
int main(void){
int i;
total = 0;
for (i = 0; i < 10; i++) {
total += a[i];
}
while(1);
}
26
Example:
Calculate the Sum of an Array
Instruction Data
Memory (Flash) Memory (RAM)
27
Example:
Calculate the Sum of an Array
0010 0001 0000 0000
0100 1010 0000 1000
0110 0000 0001 0001 MOVS r1, #0x00
Instruction LDR r2, = total_addr
0010 0000 0000 0000
Memory (Flash) STR r1, [r2, #0x00]
1110 0000 0000 1000 MOVS r0, #0x00
0100 1001 0000 0111 B Check
int main(void){ 1111 1000 0101 0001 Loop: LDR r1, = a_addr
int i; 0001 0000 0010 0000 LDR r1, [r1, r0, LSL #2]
total = 0; LDR r2, = total_addr
for (i = 0; i < 10; i++) { 0100 1010 0000 0100
LDR r2, [r2, #0x00]
total += a[i]; 0110 1000 0001 0010
} ADD r1, r1, r2
while(1);
0100 0100 0001 0001 LDR r2, = total_addr
} 0100 1010 0000 0011 STR r1, [r2,#0x00]
Starting memory address 0110 0000 0001 0001 ADDS r0, r0, #1
0x08000000 Check: CMP r0, #0x0A
0001 1100 0100 0000
BLT Loop
0010 1000 0000 1010
NOP
1101 1011 1111 0100 Self: B Self
1011 1111 0000 0000
1110 0111 1111 1110
28
Example:
Calculate the Sum of an Array
0x20000000 0x0001 a[0] = 0x00000001
0x20000002 0x0000
0x20000004 0x0002 a[1] = 0x00000002
0x20000006 0x0000
Data 0x20000008 0x0003 a[2] = 0x00000003
Memory (RAM) 0x2000000A 0x0000
0x2000000C 0x0004 a[3] = 0x00000004
0x2000000E 0x0000
0x20000010 0x0005 a[4] = 0x00000005
int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int total; 0x20000012 0x0000
0x20000014 0x0006 a[5] = 0x00000006
0x20000016 0x0000
0x20000018 0x0007 a[6] = 0x00000007
0x2000001A 0x0000
0x2000001C 0x0008 a[7] = 0x00000008
0x2000001E 0x0000
Assume the starting memory address of a[8] = 0x00000009
0x20000020 0x0009
the data memory is 0x20000000
0x20000022 0x0000
0x20000024 0x000A a[9] = 0x0000000A
0x20000026 0x0000
0x20000028 0x0000 total= 0x00000000
0x2000002A 0x0000
Memory
Memory
address
content
in bytes
29
Loading Code and Data into Memory
30
Loading Code and Data into Memory
31
Loading Code and Data into Memory
• Stack is mandatory
• Heap is used only if
dynamic allocation (e.g.
malloc, calloc) is used.
32
View of a Binary Program
33
34
from st.com
35
from st.com
36
from st.com
STM32L4
37 from st.com
Memory
Map
38