0% found this document useful (0 votes)

29 views

Module 2 Part B (Mces 21cs43)

Uploaded by

gokul.shreeraj

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Module 2 Part B (Mces 21cs43)

Uploaded by

gokul.shreeraj

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

C COMPILERS AND OPTIMIZATION

● Optimizing code takes time and reduces source code readability. Usually, it’s only worth
optimizing functions that are frequently executed and important for performance.
● We recommend you use a performance profiling tool, found in most ARM simulators, to
find these frequently executed functions.
● Document nonobvious optimizations with source code comments to aid maintainability.
● C compilers have to translate your C function literally into assembler so that it works for
all possible inputs.
● In practice, many of the input combinations are not possible or won’t occur. Let’s start by
looking at an example of the problems the compiler faces.
● The memclr function clears N bytes of memory at address data.

● No matter how advanced the compiler, it does not know whether N can be 0 on input or
not. Therefore the compiler needs to test for this case explicitly before the first iteration
of the loop.
● The compiler doesn’t know whether the data array pointer is four-byte aligned or not. If it
is four-byte aligned, then the compiler can clear four bytes at a time using an int store
rather than a char store.
● Nor does it know whether N is a multiple of four or not. If N is a multiple of four, then
the compiler can repeat the loop body four times or store four bytes at a time using an int
store.
To keep our examples concrete, we have tested them using the following specific C compilers:
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

❖ armcc from ARM Developer Suite version 1.1 (ADS1.1). You can license this compiler,
or a later version, directly from ARM.
❖ arm-elf-gcc version 2.95.2. This is the ARM target for the GNU C compiler, gcc, and is
freely available.
We have used armcc from ADS1.1 to generate the example assembler output in this book. The
following short script shows you how to invoke armcc on a C file test.c. You can use this to
reproduce our examples.

By default armcc has full optimizations turned on (the -02 command line switch). The -0time
switch optimizes for execution efficiency rather than space and mainly affects the layout of for
and while loops. If you are using the gcc compiler, then the following short script generates a
similar assembler output listing:

Basic C Data Types

ARM supports operations on different data types.
The data types we can load (or store) can be signed and unsigned words, halfwords, or bytes. The
extensions for these data types are: -h or -sh for halfwords, -b or -sb for bytes, and no extension
for words. The difference between signed and unsigned data types is:
Signed data types can hold both positive and negative values and are therefore lower in range.
Unsigned data types can hold large positive values (including ‘Zero’) but cannot hold negative
values and are therefore wider in range.
● ARM processors have 32-bit registers and 32-bit data processing operations. The ARM
architecture is a RISC load/store architecture.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

● In other words you must load values from memory into registers before acting on them.
There are no arithmetic or logical instructions that manipulate values in memory directly.

● The ARMv4 architecture and above support signed 8-bit and 16-bit loads and stores
directly, through new instructions
● ARMv5 adds instruction support for 64-bit load and stores. This is available in ARM9E
and later cores.
● Therefore ARM C compilers define char to be an unsigned 8-bit value, rather than a
signed 8-bit value as is typical in many other compilers.
● Compilers armcc and gcc use the datatype mappings
● A common example is using a char type variable i as a loop counter, with loop
continuation condition i ≥ 0.
● As i is unsigned for the ARM compilers, the loop will never terminate. Fortunately armcc
produces a warning in this situation: unsigned comparison with 0.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

● to make char signed. For example, thecommand line option -fsigned-char will make
char signed on gcc.
● The command line option -zc will have the same effect with armcc.

Local Variable Types

● ARMv4-based processors can efficiently load and store 8-, 16-, and 32-bit data. However,
most ARM data processing operations are 32-bit only.
● For this reason, you should use a 32-bit datatype, int or long, for local variables wherever
possible.
● Avoid using char and short as local variable types, even if you are manipulating an 8- or
16-bit value.
● The one exception is when you want wrap-around to occur. If you require modulo
arithmetic of the form 255 + 1 = 0, then use the char type.
● The following code checksums a data packet containing 64 words. It shows why you
should avoid using char for local variables.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

The loop is now three instructions longer than the loop for example checksum_v2 earlier! There
are two reasons for the extra instructions:
● The LDRH instruction does not allow for a shifted address offset as the LDR instruction
did in checksum_v2. Therefore the first ADD in the loop calculates the address of item i
in the array. The LDRH loads from an address with no offset. LDRH has fewer
addressing modes than LDR as it was a later addition to the ARM instruction set.
● The cast reducing total +array[i] to a short requires two MOV instructions. The compiler
shifts left by 16 and then right by 16 to implement a 16-bit sign extend. The shift right is
a sign-extending shift so it replicates the sign bit to fill the upper 16 bits.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

FUNCTION ARGUMENT TYPES

Consider the following simple function, which adds two 16-bit values, halving the second, and
returns a 16-bit sum:

● The input values a, b, and the return value will be passed in 32-bit ARM registers. Should
the compiler assume that these 32-bit values are in the range of a short type, that is,
−32,768 to +32,767?
● Or should the compiler force values to be in this range by sign-extending the lowest 16
bits to fill the 32-bit register?
● The compiler must make compatible decisions for the function caller and callee. Either
the caller or callee must perform the cast to a short type.
● If the compiler passes arguments wide, then the callee must reduce function arguments to
the correct range. If the compiler passes arguments narrow, then the caller must reduce
the range.
● If the compiler returns values wide, then the caller must reduce the return value to the
correct range. If the compiler returns values narrow, then the callee must reduce the range
before returning the value.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

SIGNED VERSUS UNSIGNED TYPES

It is more efficient to use unsigned types for divisions. The compiler converts unsigned power of
two divisions directly to right shifts. For general divisions, the divide routine in the C library is
faster for unsigned types.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

C LOOPING STRUCTURES

LOOPS WITH A FIXED NUMBER OF ITERATIONS

Below code shows how the compiler treats a loop with incrementing count i++.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

● For an unsigned loop counter i we can use either of the loop continuation conditions
i!=0 or i>0.
● As i can’t be negative, they are the same condition. For a signed loop counter, it is
tempting to use the condition i>0 to continue the loop.
● You might expect the compiler to generate the following two instructions to implement
the loop:

LOOPS USING A VARIABLE NUMBER OF ITERATIONS

Now suppose we want our checksum routine to handle packets of arbitrary size. We pass in a
variable N giving the number of words in the data packet. Using the lessons from the last section
we count down until N = 0 and don’t require an extra loop counter i.

The checksum_v7 example shows how the compiler handles a for loop with a variable
number of iterations N.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

● On ARM7 or ARM9 processors the subtract takes one cycle and the branch three cycles,
giving an overhead of four cycles per loop.
● You can save some of these cycles by unrolling a loop—repeating the loop body several
times, and reducing the number of loop iterations by the same proportion.
● For example, let’s unroll our packet checksum example four times.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

To start with the first question, only unroll loops that are important for the overall performance
of the application. Otherwise unrolling will increase the code size with little performance benefit.
Unrolling may even reduce performance by evicting more important code from the cache.

For the second question, try to arrange it so that array sizes are multiples of your unroll amount.
If this isn’t possible, then you must add extra code to take care of the leftover cases. This
increases the code size a little but keeps the performance high.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

SUMMARY: Writing Loops Efficiently

REGISTER ALLOCATION
● The compiler attempts to allocate a processor register to each local variable you use in a
C function.
● It will try to use the same register for different local variables if the use of the variables
do not overlap.
● When there are more local variables than available registers, the compiler stores the
excess variables on the processor stack.
● These variables are called spilled or swapped out variables since they are written out to
memory (in a similar way virtual memory is swapped out to disk).
● Spilled variables are slow to access compared to variables allocated to registers.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

First let’s look at the number of processor registers the ARM C compilers have available for
allocating variables. Below table shows the standard register names and usage when following
the ARM-Thumb procedure call standard (ATPCS), which is used in code generated by C
compilers.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

● The C compiler can assign 14 variables to registers without spillage.

● In practice, some compilers use a fixed register such as r12 for intermediate scratch
working and do not assign variables to this register.
● Also, complex expressions require intermediate working registers to evaluate. Therefore,
to ensure good assignment to registers, you should try to limit the internal loop of
functions to using at most 12 local variables.
● If the compiler does need to swap out variables, then it chooses which variables to swap
out based on frequency of use.
● A variable used inside a loop counts multiple times. You can guide the compiler as to
which variables are important by ensuring these variables are used within the innermost
loop.
● The register keyword in C hints that a compiler should allocate the given variable to
a register.
● However, different compilers treat this keyword in different ways, and different
architectures have a different number of available registers (for example, Thumb and
ARM).
● Therefore we recommend that you avoid using register and rely on the compiler’s
normal register allocation routine.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

FUNCTION CALLS
● The ARM Procedure Call Standard (APCS) defines how to pass function arguments and
return values in ARM registers.
● The more recent ARM-Thumb Procedure Call Standard (ATPCS) covers ARM and
Thumb interworking as well.
● The first four integer arguments are passed in the first four ARM registers: r0, r1, r2, and
r3. Subsequent integer arguments are placed on the full descending stack, ascending in
memory as in figure. Function return integer values are passed in r0.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

● This description covers only integer or pointer arguments. Two-word arguments such as
long long or double are passed in a pair of consecutive argument registers and
returned in r0, r1.
● The compiler may pass structures in registers or by reference according to command line
compiler options.
● The first point to note about the procedure call standard is the four-register rule.
● Functions with four or fewer arguments are far more efficient to call than functions with
five or more arguments.
● For functions with four or fewer arguments, the compiler can pass all the arguments in
registers.
● For functions with more arguments, both the caller and callee must access the stack for
some arguments.
● Note that for C++ the first argument to an object method is the this pointer. This
argument is implicit and additional to the explicit arguments.
● If your C function needs more than four arguments, or your C++ method more than three
explicit arguments, then it is almost always more efficient to use structures.
● Group related arguments into structures, and pass a structure pointer rather than multiple
arguments. Which arguments are related will depend on the structure of your software.

The next example illustrates the benefits of using a structure pointer. First we show a typical
routine to insert N bytes from array data into a queue. We implement the queue using a cyclic
buffer with start address Q_start (inclusive) and end address Q_end (exclusive).
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

Example
The following code creates a Queue structure and passes this to the function to reduce the
number of function arguments.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

● The queue_bytes_v2 is one instruction longer than queue_bytes_v1, but it is in

fact more efficient overall.
● The second version has only three function arguments rather than five. Each call to the
function requires only three register setups.
● This compares with four register setups, a stack push, and a stack pull for the first
version. There is a net saving of two instructions in function call overhead.
● There are likely further savings in the callee function, as it only needs to assign a single
register to the Queue structure pointer, rather than three registers in the nonstructured
case.

Example
The function uint_to_hex converts a 32-bit unsigned integer into an array of eight
hexadecimal digits. It uses a helper function nybble_to_hex, which converts a digit d in the
range 0 to 15 to a hexadecimal digit.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

The compiler will only inline small functions. You can ask the compiler to inline a function using
the inline keyword, although this keyword is only a hint and the compiler may ignore it.
Inlining large functions can lead to big increases in code size without much performance
improvement.

POINTER ALIASING
● Two pointers are said to alias when they point to the same address.
● If you write to one pointer, it will affect the value you read from the other pointer. In a
function, the compiler often doesn’t know which pointers can alias and which pointers can’t.
● The compiler must be very pessimistic and assume that any write to a pointer may affect
the value read from any other pointer, which can significantly reduce code efficiency.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

● Note that the compiler loads from step twice. Usually a compiler optimization called
common subexpression elimination would kick in so that *step was only evaluated
once, and the value reused for the second occurrence.
● However, the compiler can’t use this optimization here. The pointers timer1 and step
might alias one another.
● In other words, the compiler cannot be sure that the write to timer1 doesn’t affect the
read from step.
● In this case the second value of *step is different from the first and has the value
*timer1. This forces the compiler to insert an extra load instruction.
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

Example

Consider the following example, which reads and then checksums a data packet:

Here get_next_packet is a function returning the address and size of the next data
packet. The previous code compiles to
lOMoAR cPSD| 27847919

Microcontroller and Embedded Systems 21CS43

03-01-03 - TEM Evo System Manual Serial Communication
100% (1)
03-01-03 - TEM Evo System Manual Serial Communication
27 pages
ARINC 665-2 Loadable SW Standards
No ratings yet
ARINC 665-2 Loadable SW Standards
60 pages
Chapter 12 - Instruction Set and Functions
100% (1)
Chapter 12 - Instruction Set and Functions
40 pages
Solutions: CS152 Computer Architecture and Engineering
No ratings yet
Solutions: CS152 Computer Architecture and Engineering
17 pages
EE Lab Manuls Fast Nu
No ratings yet
EE Lab Manuls Fast Nu
71 pages
DsPIC30F Programmers Reference Manual
No ratings yet
DsPIC30F Programmers Reference Manual
360 pages
Hello World
No ratings yet
Hello World
18 pages
UNIT-IV Basic C Data Types
No ratings yet
UNIT-IV Basic C Data Types
24 pages
Es (U4) 1
No ratings yet
Es (U4) 1
24 pages
Module 3
No ratings yet
Module 3
21 pages
Module 3 Notes
No ratings yet
Module 3 Notes
18 pages
BCS402_MC_M3_Notes SJCIT
No ratings yet
BCS402_MC_M3_Notes SJCIT
18 pages
Arm Unit 3
No ratings yet
Arm Unit 3
62 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
25 pages
BCS402 Module 3 PDF
No ratings yet
BCS402 Module 3 PDF
18 pages
21CS43 - MCES Module-3 Chapter 1-2023.
No ratings yet
21CS43 - MCES Module-3 Chapter 1-2023.
23 pages
BCS402 M3
No ratings yet
BCS402 M3
110 pages
Module 3 Book1_merged
No ratings yet
Module 3 Book1_merged
42 pages
Module 3
No ratings yet
Module 3
35 pages
Module-5
No ratings yet
Module-5
33 pages
ARM MC Module 03
No ratings yet
ARM MC Module 03
21 pages
21CS43 Notes-PDF 3
No ratings yet
21CS43 Notes-PDF 3
17 pages
Embedded_C_1708564537
No ratings yet
Embedded_C_1708564537
55 pages
Embedded C Programming
No ratings yet
Embedded C Programming
49 pages
Lecture 08
No ratings yet
Lecture 08
17 pages
Embedded C Interview Questions
75% (4)
Embedded C Interview Questions
3 pages
Module-3 ARMProgram Notes.-16857877494142 PDF
No ratings yet
Module-3 ARMProgram Notes.-16857877494142 PDF
5 pages
Microcontroller and Embedded Systems 21cs43 Mes Vtu Notes 2021
No ratings yet
Microcontroller and Embedded Systems 21cs43 Mes Vtu Notes 2021
221 pages
Arm Programming Using Assembly Language: Microcontroller and Embedded Systems
No ratings yet
Arm Programming Using Assembly Language: Microcontroller and Embedded Systems
16 pages
ARM - QB-Unit-3 & 4
No ratings yet
ARM - QB-Unit-3 & 4
3 pages
C Programming Language Review Language Review: 1 Embedded Systems
No ratings yet
C Programming Language Review Language Review: 1 Embedded Systems
49 pages
Module III
No ratings yet
Module III
58 pages
Embedded C
100% (2)
Embedded C
48 pages
arm final
No ratings yet
arm final
10 pages
Sehs3317 L4
No ratings yet
Sehs3317 L4
53 pages
SET - ARM - Inst
No ratings yet
SET - ARM - Inst
4 pages
Embedded C
No ratings yet
Embedded C
57 pages
Table 1a: The Complete MSP430 Instruction Set of 27 Core Instructions
No ratings yet
Table 1a: The Complete MSP430 Instruction Set of 27 Core Instructions
9 pages
Module 2
No ratings yet
Module 2
41 pages
INTRODUCTION_UNIT
No ratings yet
INTRODUCTION_UNIT
8 pages
Pape 3
No ratings yet
Pape 3
20 pages
Module 3 Embeeded Systems Arm Instruction Set & Alp
No ratings yet
Module 3 Embeeded Systems Arm Instruction Set & Alp
49 pages
8051 C Programming
No ratings yet
8051 C Programming
56 pages
Embedded C Programming
100% (1)
Embedded C Programming
57 pages
4 - Chapter 3 C Programming - 1 - 2024
No ratings yet
4 - Chapter 3 C Programming - 1 - 2024
44 pages
Lecture 06
No ratings yet
Lecture 06
76 pages
unit-2-es
No ratings yet
unit-2-es
9 pages
Class Ans Q
No ratings yet
Class Ans Q
24 pages
Week 9
No ratings yet
Week 9
30 pages
MC IA-2 (1)
No ratings yet
MC IA-2 (1)
14 pages
CENG320ExamReview1
No ratings yet
CENG320ExamReview1
21 pages
ARM Flow Control Instructions
No ratings yet
ARM Flow Control Instructions
8 pages
CSE331_L3_ARM_ISA
No ratings yet
CSE331_L3_ARM_ISA
103 pages
Lec - 4 C Assembly
No ratings yet
Lec - 4 C Assembly
50 pages
Module4 Part1
No ratings yet
Module4 Part1
30 pages
Lecture 9 Using C
No ratings yet
Lecture 9 Using C
28 pages
Tutorial 09 Sol
No ratings yet
Tutorial 09 Sol
5 pages
Lecture 02
No ratings yet
Lecture 02
58 pages
Arm Instruction Set
No ratings yet
Arm Instruction Set
54 pages
Embedded C_Lecture 3
No ratings yet
Embedded C_Lecture 3
5 pages
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Pic® Micro Principles on Your Mobile
From Everand
Pic® Micro Principles on Your Mobile
Clive W. Humphris
No ratings yet
IDAPython Book PDF
No ratings yet
IDAPython Book PDF
49 pages
Wipro Previous Year Question Paper
No ratings yet
Wipro Previous Year Question Paper
18 pages
Chapter - 2 Instruction Set Architecture 2.1 Memory Locations and Addresses
No ratings yet
Chapter - 2 Instruction Set Architecture 2.1 Memory Locations and Addresses
11 pages
19ECE304 - Chapter 3,5 - ARM
No ratings yet
19ECE304 - Chapter 3,5 - ARM
115 pages
Um en Visu Vba Driver Interfaces 8159 en 23
No ratings yet
Um en Visu Vba Driver Interfaces 8159 en 23
106 pages
General Aptitude: 01. Ans: (D) Sol: Means To Cause To Go Round or Rotate
No ratings yet
General Aptitude: 01. Ans: (D) Sol: Means To Cause To Go Round or Rotate
44 pages
tp6113 Kohler Generator
No ratings yet
tp6113 Kohler Generator
160 pages
Dspic 30
No ratings yet
Dspic 30
248 pages
100 Objective Questions
33% (9)
100 Objective Questions
24 pages
Conrad RX63N Advanced PDF
No ratings yet
Conrad RX63N Advanced PDF
356 pages
Interfacing PIC Microcontrollers 2nd Edition Martin Bates All Chapters Instant Download
No ratings yet
Interfacing PIC Microcontrollers 2nd Edition Martin Bates All Chapters Instant Download
45 pages
Lecture 9 Arrays
100% (5)
Lecture 9 Arrays
7 pages
MIPS Green Sheet
No ratings yet
MIPS Green Sheet
6 pages
KTD-00734-J BFlash User Manual
No ratings yet
KTD-00734-J BFlash User Manual
19 pages
7KL0642DPHB02
No ratings yet
7KL0642DPHB02
52 pages
ELC2 Programming Manual
No ratings yet
ELC2 Programming Manual
812 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
64 pages
Computer Organization and Architecture C PDF
No ratings yet
Computer Organization and Architecture C PDF
24 pages
s7 Symbol Table Data Type
No ratings yet
s7 Symbol Table Data Type
4 pages
Tutorial EMU8086
No ratings yet
Tutorial EMU8086
92 pages
CS2304 System Software UNIT I NOTES PDF
No ratings yet
CS2304 System Software UNIT I NOTES PDF
9 pages
Computer Organization
No ratings yet
Computer Organization
67 pages
HW4 Exercise Quiz Quiz Instructions: Flag Question: Question 1
No ratings yet
HW4 Exercise Quiz Quiz Instructions: Flag Question: Question 1
7 pages
Detecting PCI Devices: On Identifying The Peripheral Equipment Installed in Our PC
No ratings yet
Detecting PCI Devices: On Identifying The Peripheral Equipment Installed in Our PC
22 pages
UHFReader18 Demo Software User's Guidev2.1
No ratings yet
UHFReader18 Demo Software User's Guidev2.1
18 pages
ECE2015 - CA - All Slides
No ratings yet
ECE2015 - CA - All Slides
553 pages