0% found this document useful (0 votes)
34 views

Chapter 01 See Program Running

This document provides an overview of embedded systems with ARM Cortex-M microcontrollers. It discusses why assembly language is important to learn, particularly for performance-critical and hardware-specific applications. It also provides background on ARM processors, describing their prevalence in devices like smartphones, Apple Watch, and smart home appliances. The document explains the memory architecture of ARM Cortex-M microcontrollers and compares von Neumann and Harvard architectures. It outlines the different levels of program code from high-level languages to assembly to machine code. Finally, it describes how programs are executed using processor registers like the program counter.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Chapter 01 See Program Running

This document provides an overview of embedded systems with ARM Cortex-M microcontrollers. It discusses why assembly language is important to learn, particularly for performance-critical and hardware-specific applications. It also provides background on ARM processors, describing their prevalence in devices like smartphones, Apple Watch, and smart home appliances. The document explains the memory architecture of ARM Cortex-M microcontrollers and compares von Neumann and Harvard architectures. It outlines the different levels of program code from high-level languages to assembly to machine code. Finally, it describes how programs are executed using processor registers like the program counter.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Embedded Systems with ARM Cortex-M Microcontrollers in Assembly Language and C

Chapter 1
Computer and Assembly Language

Dr. Yifeng Zhu


Electrical and Computer Engineering
University of Maine

Modified by Dr. Jonathan Phillips for USU ECE 3710, Fall 2018

1
Embedded Systems

2
Amazon Warehouse

Kiva Robot

3
Why do we learn Assembly?
 Assembly isn’t “just another language”.
 Help you understand how does the processor work
 Assembly program runs faster than high-level language. Performance critical codes must be written in
assembly.
 Use the profiling tools to find the performance bottle and rewrite that code section in assembly
 Latency-sensitive applications, such as aircraft controller
 Standard C compilers do not use some operations available on ARM processors, such ROR (Rotate Right) and
RRX (Rotate Right Extended).
 Hardware/processor specific code,
 Processor booting code
 Device drivers
 A test-and-set atomic assembly instruction can be used to implement locks and semaphores.
 Cost-sensitive applications
 Embedded devices, where the size of code is limited, wash machine controller, automobile controllers
 The best applications are written by those who've mastered assembly language or fully understand the
low-level implementation of the high-level language statements they're choosing.
4
Why ARM processor
 As of 2005, 98% of the more than one billion mobile phones
sold each year used ARM processors

 As of 2009, ARM processors accounted for approximately 90%


of all embedded 32-bit RISC processors

 In 2010 alone, 6.1 billion ARM-based processor, representing


95% of smartphones, 35% of digital televisions and set-top
boxes and 10% of mobile computers

 As of 2014, over 50 billion ARM processors have been


produced

5
iPhone 7
Teardown

A10 processor:
• 64-bit system on chip (SoC)
• ARMv8-A core

6
Apple Watch
 Apple S1 Processor
 32-bit ARMv7-A compatible
 # of Cores: 1
 CMOS Technology: 28 nm
 L1 cache 32 KB data
 L2 cache 256 KB
 GPU PowerVR SGX543

7
Nest Learning Thermostat

source: ifixit.com

 ST Microelectronics STM32L151VB ultra-low-power 32 MHz


ARM Cortex-M3 MCU
8
Data Address
Memory 8 bits 32 bits

 Memory is arranged as a series of “locations” 0xFFFFFFFF


 Each location has a unique “address”
 Each location holds a byte (byte-addressable)
 e.g. the memory location at address 0x080001B0
contains the byte value 0x70, i.e., 112
 The number of locations in memory is limited 70 0x080001B0
 e.g. 4 GB of RAM BC 0x080001AF
 1 Gigabyte (GB) = 230 bytes 18 0x080001AE
01 0x080001AD
 232 locations  4,294,967,296 locations!
A0 0x080001AC
 Values stored at each location can represent
either program data or program instructions
 e.g. the value 0x70 might be the code used to tell
the processor to add two values together

0x00000000
9 Memory
Computer Architecture
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.

10
Computer Architecture
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.

11
ARM Cortex-M Series Family
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.

ARM ARM ARM ARM


Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4

ARMv6-M ARMv6-M ARMv7-M ARMv7E-M

ARM ARM ARM ARM


Cortex-M1 Cortex-M23 Cortex-M7 Cortex-M33

ARMv6-M ARMv8-M ARMv7E-M ARMv8-M

12
Levels of Program Code
C Program Assembly Program Machine Program
0010000100000000
int main(void){ 0010000000000000
int i; 1110000000000001
int total = 0; Compile Assemble 0100010000000001
for (i = 0; i < 10; i++) {
0001110001000000
total += i;
} 0010100000001010
while(1); // Dead loop 1101110011111011
} 1011111100000000
1110011111111110

 High-level language  Assembly language  Hardware


 Level of abstraction  Textual representation representation
closer to problem of instructions  Binary digits
domain (bits)
 Provides for productivity  Encoded
and portability instructions and
data

13
See a Program Runs
C Code
Assembly Code
int main(void){
int a = 0; MOVS r1, #0x00 ; int a = 0
int b = 1; compiler MOVS r2, #0x01 ; int b = 1
int c; ADDS r3, r1, r2 ;c = a + b
c = a + b; MOVS r0, 0x00 ; set return value
return 0; BX lr ; return
}

Machine Code
0010000100000000 2100 ; MOVS r1, #0x00
0010001000000001 2201 ; MOVS r2, #0x01
0001100010001011 188B ; ADDS r3, r1, r2
0010000000000000 2000 ; MOVS r0, #0x00
0100011101110000 4770 ; BX lr
In Binary In Hex
14
Processor Registers
32 bits
 Fastest way to read and write
 Registers are within the processor chip
R0  A register stores 32-bit value
R1
 STM32L has
R2
 R0-R12: 13 general-purpose registers
Low R3
Registers
R4  R13: Stack pointer (Shadow of MSP or PSP)
R5  R14: Link register (LR)
General
R6 Purpose  R15: Program counter (PC)
Register
R7  Special registers (xPSR, BASEPRI, PRIMASK, etc)
R8
R9
High
32 bits
Registers R10
R11 xPSR
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
R14 (LR) FAULTMASK
R15 (PC) CONTROL

15
Program Execution
 Program Counter (PC) is a register that holds the memory
address of the next instruction to be fetched from the memory.

Memory Address
1. Fetch
instruction at
PC address 4770 0x080001B4
2000 0x080001B2
PC 188B 0x080001B0
2201 0x080001AE
3. Execute 2. Decode 2100 0x080001AC
the the
instruction instruction PC = 0x080001B0
Instruction = 188B or
2000188B or 8B180020

16
Three-state pipeline:
Fetch, Decode, Execution
 Pipelining allows hardware resources to be fully utilized
 One 32-bit instruction or two 16-bit instructions can be fetched.

Pipeline of 32-bit instructions

17
Three-state pipeline:
Fetch, Decode, Execution
 Pipelining allows hardware resources to be fully utilized
 One 32-bit instruction or two 16-bit instructions can be fetched.
Clock

Instruction Instruction Instruction


Instruction i
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 1
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 2
Fetch Decode Execution

Instruction Instruction Instruction


Instruction i + 2
Fetch Decode Execution

Pipeline of 16-bit instructions


18
Machine codes are stored in memory
Data Address
r15 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1
r0
0x00000000
Registers CPU
19 Memory
Fetch Instruction: pc = 0x08001AC
Decode Instruction: 2100 = MOVS r1, #0x00
Data Address
r15 0x080001AC pc 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1
r0
0x00000000
Registers CPU
20 Memory
Execute Instruction:
MOVS r1, #0x00
Data Address
r15 0x080001AC pc 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1 0x00000000
r0
0x00000000
Registers CPU
21 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 2201 = MOVS r2, #0x01
Data Address
r15 0x080001AE pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0
0x00000000
Registers CPU
22 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 188B = ADDS r3, r1, r2
Data Address
r15 0x080001B0 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3 0x00000001
r2 0x00000001
r1 0x00000000
r0
0x00000000
Registers CPU
23 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 2000 = MOVS r0, #0x00
Data Address
r15 0x080001B2 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0 0x00000000
0x00000000
Registers CPU
24 Memory
Fetch Next Instruction: pc = pc + 2
Decode & Decode: 4770 = BX lr
Data Address
r15 0x080001B4 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0 0x00000000
0x00000000
Registers CPU
25 Memory
Example:
Calculate the Sum of an Array

int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};


int total;

int main(void){
int i;
total = 0;
for (i = 0; i < 10; i++) {
total += a[i];
}
while(1);
}

26
Example:
Calculate the Sum of an Array

Instruction Data
Memory (Flash) Memory (RAM)

int main(void){ int a[10] = {1, 2, 3, 4, 5, 6, 7,


int i; 8, 9, 10};
total = 0; int total;
for (i = 0; i < 10; i++) {
total += a[i];
CPU } I/O Devices
while(1);
}
Starting memory address Starting memory address
0x08000000 0x20000000

27
Example:
Calculate the Sum of an Array
0010 0001 0000 0000
0100 1010 0000 1000
0110 0000 0001 0001 MOVS r1, #0x00
Instruction LDR r2, = total_addr
0010 0000 0000 0000
Memory (Flash) STR r1, [r2, #0x00]
1110 0000 0000 1000 MOVS r0, #0x00
0100 1001 0000 0111 B Check
int main(void){ 1111 1000 0101 0001 Loop: LDR r1, = a_addr
int i; 0001 0000 0010 0000 LDR r1, [r1, r0, LSL #2]
total = 0; LDR r2, = total_addr
for (i = 0; i < 10; i++) { 0100 1010 0000 0100
LDR r2, [r2, #0x00]
total += a[i]; 0110 1000 0001 0010
} ADD r1, r1, r2
while(1);
0100 0100 0001 0001 LDR r2, = total_addr
} 0100 1010 0000 0011 STR r1, [r2,#0x00]
Starting memory address 0110 0000 0001 0001 ADDS r0, r0, #1
0x08000000 Check: CMP r0, #0x0A
0001 1100 0100 0000
BLT Loop
0010 1000 0000 1010
NOP
1101 1011 1111 0100 Self: B Self
1011 1111 0000 0000
1110 0111 1111 1110

28
Example:
Calculate the Sum of an Array
0x20000000 0x0001 a[0] = 0x00000001
0x20000002 0x0000
0x20000004 0x0002 a[1] = 0x00000002
0x20000006 0x0000
Data 0x20000008 0x0003 a[2] = 0x00000003
Memory (RAM) 0x2000000A 0x0000
0x2000000C 0x0004 a[3] = 0x00000004
0x2000000E 0x0000
0x20000010 0x0005 a[4] = 0x00000005
int a[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int total; 0x20000012 0x0000
0x20000014 0x0006 a[5] = 0x00000006
0x20000016 0x0000
0x20000018 0x0007 a[6] = 0x00000007
0x2000001A 0x0000
0x2000001C 0x0008 a[7] = 0x00000008
0x2000001E 0x0000
Assume the starting memory address of a[8] = 0x00000009
0x20000020 0x0009
the data memory is 0x20000000
0x20000022 0x0000
0x20000024 0x000A a[9] = 0x0000000A
0x20000026 0x0000
0x20000028 0x0000 total= 0x00000000
0x2000002A 0x0000
Memory
Memory
address
content
in bytes
29
Loading Code and Data into Memory

30
Loading Code and Data into Memory

31
Loading Code and Data into Memory

• Stack is mandatory
• Heap is used only if
dynamic allocation (e.g.
malloc, calloc) is used.

32
View of a Binary Program

33
34
from st.com
35
from st.com
36
from st.com
STM32L4

37 from st.com
Memory
Map

38

You might also like