0% found this document useful (0 votes)
18 views70 pages

EE234 - Lec - 04

This document is a lecture on Assembly Language Programming for the AVR XMEGA Microcontroller, covering shift and rotate instructions, Boolean instructions, and creating time delays using program loops. It includes examples of assembly code for shifting a 32-bit number and generating delays, as well as discussions on stack operations and subroutine management. The document also addresses issues related to parameter passing and local variable allocation in subroutines.

Uploaded by

alexspammail123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views70 pages

EE234 - Lec - 04

This document is a lecture on Assembly Language Programming for the AVR XMEGA Microcontroller, covering shift and rotate instructions, Boolean instructions, and creating time delays using program loops. It includes examples of assembly code for shifting a 32-bit number and generating delays, as well as discussions on stack operations and subroutine management. The document also addresses issues related to parameter passing and local variable allocation in subroutines.

Uploaded by

alexspammail123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

AVR XMEGA Microcontroller

Lecture 4

Assembly Language Programming


By
Dr. Han –Way Huang
Minnesota State University, Mankato

02/25/2025 1
Shift and Rotate Instructions
 Useful in bit field manipulation
 Useful for bit field extraction

lsl rd ; bit 7 of rd shift to C flag, 0 is shifted to bit 0

lsr rd ; 0 is shifted to bit 7, bit 0 is shifted to C flag

rol rd ; C flag transfers to bit 0, bit 7 is shifted to C


flag
ror rd ; C flag transfers to bit 7, bit 0 is shifted to C
flag
asr rd ; arithmetic shift right, bit 7 duplicates itself
02/25/2025 2
swap r ; upper 4 bits and lower 4 bits swap
Example 4.1 Let [r0] = 0x9B and C = 0, what will be the contents of
r0 and C after the execution of the following instructions:
(a) lsl r0 (b) lsr r0 (c) ror r0 (d) rol r0 (e) asr r0 (d) swap r0
Solution:

02/25/2025 3
Multiple-Byte Shift
 For a k-byte value stored at loc, loc+1, …, loc+k-1, the byte
at loc is the least significant byte whereas the byte at loc+k-1
is the most significant byte.
 Right-shift operation should start at the most significant
byte (loc+k-1).
 Left-shift operation should start at the least significant byte
(loc).

02/25/2025 4
To shift right
Step 1
Shift the byte at loc+k-1 to the right.
Step 2
Rotate the byte at loc+k-2 to the right.
Step 3
Repeat step 2 for the bytes located at loc+k-3 until loc.

02/25/2025 5
To Shift Left
Step 1
Shift the byte at loc
Step 2
Rotate the byte at loc + 1
Step 3
Rotate the remaining byte until reaching the byte at loc.

02/25/2025 6
Example 4.2 Write a program to shift the 32-bit number stored at
0x2000~0x2003 in data memory to the right four places.
Solution:
.include <atxmega128A1def.inc>
.cseg
.def lpCnt, r19
.org 0x00
jmp start
.org 0xF6
start: ldi lpCnt, 4
loop: lds r0, 0x2000
lds r1, 0x2001
lds r2, 0x2002
lds r3, 0x2003

02/25/2025 7
sloop: lsr r3 ; least significant byte
ror r2 ; second least significant byte
ror r1 ; second most significant byte
ror r0 ; most significant byte
dec lpCnt
brnesloop
sts 0x2000, r0
sts 0x2001, r1
sts 0x2002, r2
sts 0x2003, r3
here: rjmphere

02/25/2025 8
Boolean Instructions
Table 3.8 A summary of AVR Boolean instructions
Mnemonics Description Operation
and Rd, Rr Logical AND Rd  [Rd] [Rr]
andi Rd, k Logical AND with immediate Rd  [Rd]  k
or Rd, Rr Logical OR Rd  [Rd]  [Rr]
ori Rd, k Logical OR with immediate Rd  [Rd]  k
eor Rd, Rr Exclusive OR Rd  [Rd] [Rr]
com Rd One’s complement Rd  0xFF – [Rd]
neg Rd Two’s complement Rd  0x00 – [Rd]
sbr Rd, k Set bit (s) in register Rd  [Rd]  k
cbr Rd, k Clear bit (s) in register Rd  [Rd] (0xFF – k)
tst Rd Test for zero or minus Rd  [Rd[ [Rd]
clr Rd Clear register Rd  [Rd] [Rd]
set Rd Set register Rd  0xFF

02/25/2025 9
Applications of Boolean Instructions
Clear a few bits of a register
andi r16, 0xF0 ; clear the lower 4 bits
Set a few bits of a register to 1
ori r16, 0x44 ; set bits 6 and 2 of r16 to 1
Toggle a few bits of a register
ldi r17, 0xCC
eor r16, r17 ; toggle bits 7, 6, 3, & 2 in r16
Find the One’s Complement of a Register
com r16

02/25/2025 10
Create Time Delay Using Program Loops
 Instruction execution takes time
 Time delay can be created by executing an appropriate number
of instructions.
Method:
Step 1
Select a sequence of instructions that takes a certain number of
CPU clock cycles to execute.
Step 2
Repeat the chosen instruction sequence for an appropriate
number of times.
02/25/2025 11
ldi r21, 250
loop0: push r0 ; 2 CPU clock cycles
pop r0 ; 2 CPU cycles
push r0
pop r0
push r0
pop r0
push r0
pop r0
push r0
pop r0
push r0
pop r0
push r0
pop r0
nop ; 1 CPU clock cycle
dec r21 ; 1 CPU clock cycle
brne loop0 ; 2 (1) cycle when branch is taken (not taken)

02/25/2025 12
The instruction sequence in the previous page can be shortened to
ldi r21, 250
loop0: ldi r20, 4 ; 1 CPU clock cycle
loopi: push r0 ; 2 CPU clock cycles
pop r0 ; 2 CPU clock cycles
dec r20 ; 1 CPU clock cycle
brne loopi ; 2 (1) cycles when branch is taken (not taken)
dec r21 ; 1 cycle
brne loop0 ; 2 (1) cycles when branch is taken (not taken)

By loading 250 into r21, the previous loop can create a delay of 0.25 ms
assuming that CPU clock is 32 MHz.

02/25/2025 13
Instruction Sequence that Creates 50 ms delay:
ldi r17, 200
loop1: ldi r21, 250
loop0: ldi r20, 4 ; 1 CPU clock cycle
loopi: push r0 ; 2 CPU clock cycles
pop r0 ; 2 CPU clock cycles
dec r20 ; 1 CPU clock cycle
brne loopi ; 2 (1) cycles when branch is taken (not
taken)
dec r21 ; 1 cycle
brne loop0 ; 2 (1) cycles when branch is taken (not
taken)
dec r17
brne loop1

02/25/2025 14
Creating Longer Delay
 Use multi-layer program loops
 An instruction sequence that create a delay of 1 s is as follows:
ldi r18,20
loop2: ldi r17, 200
loop1: ldi r21, 250
loop0: ldi r20, 4
loopi: push r0
pop r0
dec r20
brne loopi
dec r21
brne loop0
dec r17
brne loop1
dec r18
brne loop2

02/25/2025 15
Stack Data Structure
 Element can only be accessed from its top.
 Add a new element to the stack by pushing.
 Removing an element from the stack by pulling (or popping).
 Has a pointer that either points to the top element or the
location above the top element (for AVR MCU).
Setup the Stack Pointer
ldi r16, low(RAMEND)
out CPU_SPL, r16
ldi r16, high(RAMEND)
out CPU_SPH, r16

02/25/2025 16
Instructions for Stack Operation
pop rd ; SP  [SP] + 1; rd  [SP]
push rd ; mem([SP])  [rd]; SP  [SP] – 1,
Use of Stack
 Store return address of subroutine call and interrupt service.
 Temporary storage
 Store local variables for subroutine execution
 Holding place for return values for subroutine call

02/25/2025 17
A Simple Subroutine
; --------------------------------------------------------------------------------------------
; This subroutine swaps the contents of r16 & r17
; --------------------------------------------------------------------------------------------
swapRegs: push r16
mov r16, r17
pop r17
ret

02/25/2025 18
Subroutine to Generate a Delay of 250 ms

delay250us:
ldi r21, 250
loopo: ldi r20, 4
loopi: push r0
pop r0
dec r20
brne loopi
dec r21
brne loopo
ret

Flexibility can be added to this subroutine and make it more useful.

02/25/2025 19
delayby250us:
ldi r21, 250
loopo: ldi r20, 4
loopi: push r0
pop r0
dec r20
brne loopi
dec r21
brne loopo
dec r16
brne delayby250us
ret
How to Call (to create 50-ms delay)
ldi r16, 200
call delayby250us
02/25/2025 20
delayby50ms:
ldi r17, 200
loop3: ldi r21, 250
loop2: ldi r20, 4
loop1: push r0
pop r0
dec r20
brne loop1
dec r21
brne loop2
dec r17
brne loop3
dec r16
brne delayby50ms
ret

02/25/2025 21
Issues Related to Subroutine Call
 Parameter passing
 Local variable allocation and de-allocation
 Result returning

Parameter Passing
 Using CPU registers (r0~r31)
 Using stack
 Using global memory

02/25/2025 22
Local Variable Allocation & Deallocation
 Temporary variables and results are needed for the execution
of a subroutine.
 Temporary variables and results are useful only during the
execution of the subroutine.
 Should be allocated in stack for easy allocation and
deallocation.
 When allocated in stack, a subroutine can be made into re-
entrant (can call itself).
 Best allocated in CPU registers for AVR—why?

02/25/2025 23
Allocating k Bytes in Stack for Local Variables
in YL, CPU_SPL ; transfer SP to Y
in YH, CPU_SPH ; “
sbiw YL, k
out CPU_SPL, YL
out CPU_SPH, YH
Made into a Macro How to Call?
.macro allocStk allocStk k
in YL, CPU_SPL
in YH, CPU_SPH
sbiw YL, @0
out CPU_SPL, YL
out CPU_SPH, Yh
.endmacro

02/25/2025 24
Macro for Deallocating Local Variables in Stack
.macro deallocStk
in YL, CPU_SPL
in YH, CPU_SPH
adiw YL, @0
out CPU_SPL, YL
out CPU_SPH, YH
.endmacro

How to Invoke?
deallocStk k ; call to deallocate k bytes in stack

02/25/2025 25
AVR Stack Frame

02/25/2025 26
How to Return Results?
 Returned in registers, stack, or global memory
 Best returned in registers
 If results are to returned in stack, the stack slot to hold the
result should be allocated by the caller.

02/25/2025 27
Accessing Local Variables in Stack

To Read locVark: To Write into locVark:


in YL, CPU_SPL in YL, CPU_SPL
in YH, CPU_SPH in YH, CPU_SPH
ldd rj, Y+k std Y+k, rj

02/25/2025 28
Register Usage Convention
 Both the subroutine and the caller are required to use registers,
interference might exist.
 Interference must be avoided to ensure the correct execution of the
program.
Table 5.1 Recommendation for register usage
 A recommendation for the use of AVR registers is given in Table 5.1.
Name Usage
r4~r7, r12~r15, r28, r29 Callee saved
r0~r3, r8~r11, r20, r21 Caller saved
r16~r19, r30, r31 Parameter passing
r22~r27 Result returning

02/25/2025 29
Instructions for Making Subroutine Call

Table 5.2 Subroutine call instructions


Instruction Operation
call k PC ¬ k; stack ¬ PC + 2; SP ¬ SP – 2 (devices with 16-bit PC)
SP ¬ SP – 3 (devices with 22-bit PC)
eicall PC(15:0) ¬ Z(15:0); PC(21:16) ¬ EIND; stack ¬ PC + 1; SP ¬ SP – 3
icall PC(15:0) ¬ Z(15:0); PC(21:16) ¬ 0; (devices with 22-bit PC); stack ¬ PC + 1;
SP ¬ SP – 2 (devices with 16-bit PC); SP ¬ SP – 3 (devices with 22-bit PC)
rcall k PC ¬ PC + k + 1; stack ¬ PC + 1; SP ¬ SP – 2 (devices with 16-bit PC);
SP ¬ SP – 3 (devices with 22-bit PC)
ret PC(15:0) ¬ stack (devices with 16-bit PC); SP ¬ SP + 2
PC(21:0) ¬ stack (devices with 22-bit PC); SP ¬ SP + 3

02/25/2025 30
A Few Examples of Subroutines
delay50ms: delayby50ms:
ldi r19, 200 ldi r19, 200
loop3: ldi r21, 250 loop3: ldi r21, 250
loop2: ldi r20, 4 loop2: ldi r20, 4
loop1: push r0 loop1: push r0
pop r0 pop r0
dec r20 dec r20
brne loop1 brne loop1
dec r21 dec r21
brne loop2 brne loop2
dec r19 dec r19
brne loop3 brne loop3
ret dec r16
brne delayby50ms
ret

ldi r16, 20
call delayby50ms ; create 1 s
02/25/2025 31
delay
Subroutine to Multiply two 16-bit Unsigned Integers
P and Q are two 16-bit unsigned integers
P = PHPL = PH x 28 + PL
Q = QHQL = QH x 28 + QL

P x Q = PHQH x 216 + (PHQL + QHPL) x 28 + PLQL

02/25/2025 32
msb lsb
partial product
PLQL
partial product
PHQL

partial product
PLQH
partial product
+ PHQH

address PR + 3 PR + 2 PR + 1 PR Final product P × Q

Figure 5.4 16-bit by 16-bit multiplication

02/25/2025 33
Subroutine to Multiply two 16-bit Unsigned Integers
Incoming arguments: r16:r17 & r18:r19
Result returned in: r22~r25 (lsb to msb)

;
-----------------------------------------------------------------------------------------------------------------
-------------
; The first and second numbers are passed in r17:r16 & r19:r18.
; The product is returned in r25..r22.
;
-----------------------------------------------------------------------------------------------------------------
-------------
.def p1 = r22 ; lsb of the product
.def p2 = r23
.def p3 = r24
.def p4 = r25 ; msb of the product
.def PH = r17 ; high byte of multiplicand
.def PL = r16 ; low byte of multiplicand
02/25/2025 .def QH = r19 ; high byte of multiplier 34
mul16U: mul PL, QL
movw p1, r0 ; (p2:p1) <-- QL x PL
mul PH, QH
movw p3, r0 ; (p4:p3) <-- QH x PH
mul PL, QH ; compute PL x QH
add p2, r0 ; add partial product to p3:p2
adc p3, r1 ; “
clr zero ; add carry to p4
adc p4, zero ; “
mul PH, QL ; compute PH x QL
add p2, r0 ; add partial product to p3:p2
adc p3, r1 ; “
adc p4, zero ; add carry to p4
ret

02/25/2025 35
Write an instruction sequence to multiply the 16-bit numbers stored
at data memoy 0x2000~0x2001 & 0x2010~0x2011 and store the
product at data memory 0x2020~0x2023.
Solution:
lds r16, 0x2000
lds r17, 0x2001
lds r18, 0x2010
lds r19, 0x2011
call mul16U
sts 0x2020, r22
sts 0x2021, r23
sts 0x2022, r24
sts 0x2023, r25

02/25/2025 36
Algorithm for Multiplying two Signed 16-bit Numbers
Step 1
Multiply two operands disregarding the sign.
Step 2
If op1 is negative, subtract op2 from the upper half of the
product.
Step 3
If op2 is negative, subtract op1 from the upper half of the
product.

02/25/2025 37
Signed 16-bit Multiplication Subroutine
 Incoming argument
 Multiplicand: r16: r17
 Multiplier: r18: r19
 Result
 Product: r22~r25

02/25/2025 38
mul16s: call mul16U
sbrs r17, 7 ; check the sign of the first number
rjmp chk2 ; first number is positive, check 2nd number
sub r24, r18 ; subtract the second number from upper half of
sbc r25, r19 ; product
chk2: sbrs r19, 7 ; check the sign of the second number
rjmp doneMs ; second number is positive, prepare return
sub r24, r16 ; subtract the first number from upper half of
sbc r25, r17 ; product
doneMs: ret
.include “mul16U.asm”

02/25/2025 39
Writing Subroutine to Perform Unsigned Division
 Shift-and-subtract method is often used to carry the division.

msb
R register Q register

lsb
Set LSB

Write R Shift left

C
Controller
ALU

P register

Figure A16. The shift-and-subtract divider hardware

02/25/2025 40
16-bit Unsigned Division Algorithm
Step 1
Load divisor, dividend, and 0, into P, Q, and R registers. lpCnt 
16.
Step 2
Shift R:Q to the left one place.
Step 3
Subtract P from R and place the difference back to R if the
difference is non-negative.
Step 4
Set the bit 0 of Q to 1 if the difference computed in Step 3 is non-
negative. Otherwise, set it to 0.
Step 5
lpCnt  lpCnt – 1; if (lpCnt > 0) go to Step 2; else Stop.

02/25/2025 41
;
--------------------------------------------------------------------------------------------
----------------------
; Dividend and divisor are passed in r17:r16 & r19:r18,
respectively.
; Quotient is returned in r25:r24; remainder is returned in
r23:r22.
;
--------------------------------------------------------------------------------------------
----------------------
.def lpCnt = r20
.def RL = r22
.def RH = r23
.def QL = r24
.def QH = r25
.def PL = r18
02/25/2025 42
.def PH = r19
div16U: ldi lpCnt, 16 ; set up loop count
movw QL, r16 ; load dividend into Q register
clr RL ; initialize R register to 0
clr RH ; “
dvloop: lsl QL ; shift R:Q to left one place
rol QH ; “
rol RL ; “
rol RH ; “
cp RL, PL ; transfer RL:RH to tmpL:tmpH
cpc RH, PH
brlo nxtb ; perform unsigned comparison
ori QL, 0x01 ; set bit 0 of Q to 1
sub RL, PL ; put difference in R
sbc RH, PH ; “
nxtb: dec lpCnt
brne dvloop
ret

02/25/2025 43
Example. Write a program to find all elements in an array of 16-bit numbers divisible
by 5 and save them in data memory starting from 0x2000. The array has 30 elements.
.include <atxmega128A1def.inc>
.def lpCnt = r20
.dseg
.org 0x2000
result: .byte 20
.cseg
.org 0x00
jmp start
.org 0xF6
start: ldi r16, low(RAMEND)
out CPU_SPL, r16
ldi r16, high(RAMEND)
out CPU_SPH, r16
call setCPUClkto32Mwith32MIntOsc
ldi ZL, low(array<<1)
ldi ZH, high(array<<1)
ldi YL, low(result)
ldi YH, high(result)
ldi lpCnt, 30

02/25/2025 44
loop: lpm r16, Z+
lpm r17, Z+
ldi r18, 5
ldi r19, 0
call div16U
cpi r22, 0 ; is remainder equals 0?
brne false
st Y+, r16
st Y+, r17
false: dec lpCnt
brne loop
here: jmp here
.include “sysclk_xmega.asm”
.include “div16U.asm”
array: .dw 1234, 2345, 3456, 4567, 5678
.dw 1122, 2233, 3344, 4455, 5566

02/25/2025 45
Convert an Internal Binary Number into a BCD String
xx: number to be converted
quo: quotient of division
rem: remainder of a division
ptr: pointer to the buffer to store the resultant string

Step 1
Push 0 into the stack.
Step 2
Quo  xx / 10; rem  xx mod 10;
Step 3
Stack  rem + 0x30;
Step 4
If quo  0, xx  quo and go to Step 2; else, next step.
Step 5
Pop the string out of the stack and store it in the buffer.

02/25/2025 46
Example 6.4 Write a subroutine that can convert a 16-bit binary
number held in r16:r17 into an ASCII BCD string and stores the string
in a buffer pointed by the Z pointer.
Solution:
Parameter Passed:
 16-bit number to be converted: in r16:r17
 Pointer to buffer to hold the converted string: in Z

02/25/2025 47
.def sign = r12
.equ NULL =0
bin2BCD:clr sign
push sign
sbrs r17,7 ; check the sign of the number
rjmp normal
inc sign ; indicate sign is negative
com r16 ; find the magnitude of the given number
com r17 ; "
movw r24,r16 ; "
adiw r24,1 ; "
movw r16,r24 ; "
normal: ldi r18,10 ; set divisor to 10
clr r19 ; “
call div16U
ldi r26,0x30
add r22,r26 ; convert remainder digit to ASCII code

02/25/2025 48
push r22
sbiw r24,0 ; is quotient 0?
breq popStk ; quotient is 0, pop string out of stack
movw r16, r24; quotient is not 0, continue to divide
rjmp normal ; loop
popStk: sbrs sign,0 ; check the sign of the number
rjmp popLp ; do nothing if positive
ldi r17,'-' ; push a minus sign into stack
push r17
popLp: pop r17
cpi r17,NULL ; is it a NULL character?
breq exit
st Z+,r1
rjmp popLp
exit: st z,r17 ; terminate the string with a NULL character
ret

02/25/2025 49
Example 6.5 Write a program to find all 4-digit decimal numbers
that have the following property:
The sum of the square of the upper half and the square of the lower half of the
given number equals to the original number.

.include <atxmega128A1def.inc>
.def kL = r4 ; test number (start from 1000)
.def kH = r5 ; “
.def TOPL = r6 ; register to hold upper bound
.def TOPH = r7 ; (10000)
.dseg
.org 0x2000
buf: .byte 20
.cseg
.org 0x00
jmp start
.org 0xF6

02/25/2025 50
start: ldi r16,low(RAMEND) ; set up stack pointer
out CPU_SPL,r16 ; “
ldi r17,high(RAMEND) ; “
out CPU_SPH,r17 ; “
call setCPUClkto32Mwith32MIntOsc
ldi r16,low(1000)
ldi r17,high(1000)
movw kL,r16 ; transfer 1000 to kL:kH
ldi r16,low(10000)
ldi r17,high(10000)
movw TOPL,r16 ; place 10000 in TOPL:TOPH
ldi ZL,0 ; use Z as buffer pointer
ldi ZH,0x20 ; "
floop: movw r16,kL ; pass number to be tested in r16:r17
call test
cpi r22,1 ; does test subroutine return 1?
brne next ; no, check next numbers
st Z+,kL ; save the number
st Z+,kH ; "
02/25/2025 51
next: movw r28,kL ; increment kL:kH by 1
adiw r28,1 ; "
movw kL,r28 ; "
cp kL,TOPL
brne floop
cp kH,TOPH
brne floop
done: jmp done

02/25/2025 52
test: ldi r18,100
clr r19
call div16U
mul r22, r22 ; compute rem * rem
movw r22, r0 ; transfer back to r22:r23
mul r24, r24 ; compute quo * quo
add r0, r22 ; compute rem*rem + quo*quo
adc r1, r23 ; “
cp r0, r16 ; compare with r16:r17
brne false ; branch if low bytes are not equal
cp r1, r17 ; compare high bytes
brne false
ldi r22, 1 ; returns 1 if the given number has the property
ret
false: clr r22
ret
.include "div16U.asm"
.include "sysClock_xmega.asm"

02/25/2025 53
Finding the Square Root (Successive Approximation Method)
SAR: successive approximation register
mask: mask to set a bit in an 8-bit register to 1
lpCnt: loop count
Temp: temporary variable

02/25/2025 54
Start

SAR[n - 1, ..., 0]  0
i n-1

SAR[i]  1

i i-1 yes
SAR * SAR > num? SAR[i]  0

no

no
i = 0?
yes

Stop

Figure 5.5 Successive-approximation method for finding square root

02/25/2025 55
Algorithm
Step 1
sar  0; mask  0x80; lpCnt  8; temp  0
Step 2
temp  sar OR mask // guess bit i is 1
Step 3
If ((temp * temp)  num) sar  temp;
Step 4
Mask  mask SRL 1 (shift right logically one place);
Step 5
lpCnt  lpCnt – 1
Step 6
If (lpCnt == 0) stop; else go to Step 2.

02/25/2025 56
Drawback of the Successive Approximation Algorithm
 Too pessimistic—the square root tend to be too small
 The better approximation may be [SAR] + 1 instead of [SAR]
 Need to compare ([SAR]+1)2 – num and num – [SAR]2.

Example 6.6 Write a subroutine that can find the square root of a
16-bit number. The 16-bit number of which the square root is to be
found is passed in r16: r17 and the square root is returned in r22.

02/25/2025 57
.def mask = r20
.def sar = r22
.def tmp = r1
.def lpcnt = r21
.def qL = r16
.def qH = r17
SqRoot16: ldi mask, 0x80
clr sar
ldi lpcnt, 8
sqLoop1: mov tmp, sar ; make a guess of a sar bit
or tmp, mask ; "
mul tmp, tmp ; compute sar * sar
cp qL, r0 ; compare sar * sar with q
cpc qH, r1 ; "
brlo nextb
or sar, mask ; keep the guess
nextb: lsr mask
dec lpcnt
brne sqLoop1

02/25/2025 58
;
-----------------------------------------------------------------------------------------------------------------
------------------
; Find out whether sar or sar+1 is closer to the true square root by comparing
; whether q – sar*sar or (sar+1)2 – q is smaller.
;
-----------------------------------------------------------------------------------------------------------------
------------------
mul sar, sar ; compute D1 = q - sar * sar
movw r8, qL ; "
sub r8, r0 ; "
sbc r9, r1 ; "
mov r0, sar ; compute D2 = (sar+1)*(sar+1) - q
inc r0 ; "
mul r0, r0 ; "
sub r0, qL ; "
sbc r1, qH ; "
cp r8, r0 ; compare D1 with D2
cpc r9, r1 ; "
brlo selSar ; choose sar if D1 < D2
02/25/2025 59
inc sar
Example 6.7 Write a program to find the square root of an array of 16-bit
numbers.
Solution:
.include <atxmega128A1Udef.inc>
.def llCnt = r25
.cseg
.org 0x00
jmp start
.org 0xF6
start: ldi r16, low(RAMEND)
out CPU_SPL, r16
ldi r16, high(RAMEND)
out CPU_SPH, r16
call setCPUClkto32Mwith32MIntOsc
ldi llCnt, 12
ldi YL,0 ; Y points to SRAM
ldi YH,0x20 ; “
ldi ZL,low(array<<1) ; Z points to array
ldi ZH,high(array<<1) ; “
02/25/2025 60
mloop: lpm r16,Z+ ; fetch the next 16-bit number
lpm r17,Z+ ; “
call SqRoot16
st Y+, r22
dec llCnt
brne mloop
again: jmp again
array: .dw 1234,2345,3456,4567,5678,6789
.dw 1601,2026,2506,3509,3600,5000

02/25/2025 61
Bubble Sort
 Go through the array as many iterations as one less than the
number of array or file elements.
 In each iteration, compare each adjacent pair of elements
from the lowest toward the highest array index. Swap the
adjacent elements if they are not in order.
 The sorting efficiency can be improved by keeping track of
whether swapping has been done in an iteration. If no
swapping has been done, the sorting algorithm should stop.

02/25/2025 62
iteration ¬ N - 1

sorted ¬ 1
inner ¬ iteration
i¬ 0

array[i] > array[i+1]? no

yes
swap array[i] & array[i+1]
sorted ¬ 0

inner ¬ inner - 1
i¬ i+1

no
inner = 0?

yes
yes
sorted = 1?

no
Iteration ¬ iteration - 1

no
iteration = 0?
yes
Stop

Figure 5.9 Logic flow of bubble sort


02/25/2025 63
Stack frame of bubble sort subroutine

SP
3 2 1
sorted
iCnt
eCnt
r28
r29
ret_addr
Figure 5.10 Stack frame for bubble sort subroutine

02/25/2025 64
.include <atxmega128A1Udef.inc>
.equ SPL = CPU_SPL ; commented for MEGA devices
.equ SPH = CPU_SPH ; "
.macro allocStk ; this macro allocate space for local variables
in r28,SPL
in r29,SPH
sbiw r28,@0
out SPL,r28
out SPH,r29
.endmacro
.macro deallocStk ; this macro deallocates space used by local
variables
in r28,SPL
in r29,SPH
adiw r28,@0
out SPL,r28
out SPH,r29
.endmacro

02/25/2025 65
.equ NN = 30
.def lpcnt = r21
.dseg
.org 0x2000
array: .byte 40
.cseg
.org 0x00
rjmp start
.org 0xF6
start: ldi r20,low(RAMEND) ; initialize stack pointer
out SPL,r20 ; "
ldi r20,high(RAMEND) ; "
out SPH,r20 ; "
ldi ZL,low(xarr<<1) ; set up pointer to the array in program memory
ldi ZH,high(xarr<<1) ; "
ldi XL,low(array) ; set up pointer to the buffer array in data
memory
ldi XH,high(array) ; "
ldi lpcnt,NN ; set up loop count
02/25/2025 66
cLoop: lpm r0,z+ ; copy the array from program memory to data
memory
st x+,r0 ; so that it can be sorted
dec lpcnt ;"
brne cLoop ;"
ldi r16,low(array) ; pass array pointer
ldi r17,high(array) ; "
ldi r18,NN ; pass array count
call bubble
again: rjmp again
;
---------------------------------------------------------------------------------------------------------------
---------------------------
; The next subroutine uses bubble sort algorithm to sort an array in data
memory.
; The array count is passed in r18 and the pointer to the array is passed in
r16~r17.
; All array elements are nonnegative.
;
02/25/2025 67
---------------------------------------------------------------------------------------------------------------
bubble: push YH
push YL
allocStk 3 ; allocate 3 bytes for local variables
in YL, SPL ; set Y point to the byte above the top of stack
in YH, SPH ; "
dec r18 ; initialize iteration count to NN - 1
std Y+eCnt, r18 ; "
eLoop: ldd r18,Y+eCnt ; set up inner loop count
std Y+iCnt, r18 ; "
movw ZL, r16 ; place array base address in Z
ldi r20, 1 ; set flag to indicate array is sorted
std Y+sorted, r20 ; "
iloop: ld r8, z ; fetch element array[i]
ldd r9, z+1 ; fetch element array[i+1]
cp r8, r9 ; compare array[i] with array[i+1]
brlo next

02/25/2025 68
st Z, r9 ; swap array[i] with array[i+1]
std Z+1, r8 ; "
clr r20 ; indicate array not sorted
std Y+sorted, r20 ; "
next: adiw ZL, 1 ; increment array pointer by 1
ldd r20,Y+iCnt ; decrement inner loop count
dec r20 ; "
std Y+iCnt, r20 ; "
brne iloop ; continue if inner loop count is not 0
; at the end of an iteration
ldd r20,Y+sorted ; check array sorted flag
cpi r20,true ; "
breq done ; stop if sorted flag is true (1)
ldd r20,Y+eCnt ; decrement iteration loop count
dec r20 ; "
std Y+eCnt,r20 ; "
brne eLoop

02/25/2025 69
done: deallocStk 3
pop YL
pop YH
ret
xarr: .db 12,91,20,33,45,72,24,19,17,101
.db 11,92,21,34,44,71,25,18,16,131
.db 41,43,49,50,99,79,89,98,37,59
// End of program

02/25/2025 70

You might also like