EE234 - Lec - 04
EE234 - Lec - 04
Lecture 4
02/25/2025 1
Shift and Rotate Instructions
Useful in bit field manipulation
Useful for bit field extraction
02/25/2025 3
Multiple-Byte Shift
For a k-byte value stored at loc, loc+1, …, loc+k-1, the byte
at loc is the least significant byte whereas the byte at loc+k-1
is the most significant byte.
Right-shift operation should start at the most significant
byte (loc+k-1).
Left-shift operation should start at the least significant byte
(loc).
02/25/2025 4
To shift right
Step 1
Shift the byte at loc+k-1 to the right.
Step 2
Rotate the byte at loc+k-2 to the right.
Step 3
Repeat step 2 for the bytes located at loc+k-3 until loc.
02/25/2025 5
To Shift Left
Step 1
Shift the byte at loc
Step 2
Rotate the byte at loc + 1
Step 3
Rotate the remaining byte until reaching the byte at loc.
02/25/2025 6
Example 4.2 Write a program to shift the 32-bit number stored at
0x2000~0x2003 in data memory to the right four places.
Solution:
.include <atxmega128A1def.inc>
.cseg
.def lpCnt, r19
.org 0x00
jmp start
.org 0xF6
start: ldi lpCnt, 4
loop: lds r0, 0x2000
lds r1, 0x2001
lds r2, 0x2002
lds r3, 0x2003
02/25/2025 7
sloop: lsr r3 ; least significant byte
ror r2 ; second least significant byte
ror r1 ; second most significant byte
ror r0 ; most significant byte
dec lpCnt
brnesloop
sts 0x2000, r0
sts 0x2001, r1
sts 0x2002, r2
sts 0x2003, r3
here: rjmphere
02/25/2025 8
Boolean Instructions
Table 3.8 A summary of AVR Boolean instructions
Mnemonics Description Operation
and Rd, Rr Logical AND Rd [Rd] [Rr]
andi Rd, k Logical AND with immediate Rd [Rd] k
or Rd, Rr Logical OR Rd [Rd] [Rr]
ori Rd, k Logical OR with immediate Rd [Rd] k
eor Rd, Rr Exclusive OR Rd [Rd] [Rr]
com Rd One’s complement Rd 0xFF – [Rd]
neg Rd Two’s complement Rd 0x00 – [Rd]
sbr Rd, k Set bit (s) in register Rd [Rd] k
cbr Rd, k Clear bit (s) in register Rd [Rd] (0xFF – k)
tst Rd Test for zero or minus Rd [Rd[ [Rd]
clr Rd Clear register Rd [Rd] [Rd]
set Rd Set register Rd 0xFF
02/25/2025 9
Applications of Boolean Instructions
Clear a few bits of a register
andi r16, 0xF0 ; clear the lower 4 bits
Set a few bits of a register to 1
ori r16, 0x44 ; set bits 6 and 2 of r16 to 1
Toggle a few bits of a register
ldi r17, 0xCC
eor r16, r17 ; toggle bits 7, 6, 3, & 2 in r16
Find the One’s Complement of a Register
com r16
02/25/2025 10
Create Time Delay Using Program Loops
Instruction execution takes time
Time delay can be created by executing an appropriate number
of instructions.
Method:
Step 1
Select a sequence of instructions that takes a certain number of
CPU clock cycles to execute.
Step 2
Repeat the chosen instruction sequence for an appropriate
number of times.
02/25/2025 11
ldi r21, 250
loop0: push r0 ; 2 CPU clock cycles
pop r0 ; 2 CPU cycles
push r0
pop r0
push r0
pop r0
push r0
pop r0
push r0
pop r0
push r0
pop r0
push r0
pop r0
nop ; 1 CPU clock cycle
dec r21 ; 1 CPU clock cycle
brne loop0 ; 2 (1) cycle when branch is taken (not taken)
02/25/2025 12
The instruction sequence in the previous page can be shortened to
ldi r21, 250
loop0: ldi r20, 4 ; 1 CPU clock cycle
loopi: push r0 ; 2 CPU clock cycles
pop r0 ; 2 CPU clock cycles
dec r20 ; 1 CPU clock cycle
brne loopi ; 2 (1) cycles when branch is taken (not taken)
dec r21 ; 1 cycle
brne loop0 ; 2 (1) cycles when branch is taken (not taken)
By loading 250 into r21, the previous loop can create a delay of 0.25 ms
assuming that CPU clock is 32 MHz.
02/25/2025 13
Instruction Sequence that Creates 50 ms delay:
ldi r17, 200
loop1: ldi r21, 250
loop0: ldi r20, 4 ; 1 CPU clock cycle
loopi: push r0 ; 2 CPU clock cycles
pop r0 ; 2 CPU clock cycles
dec r20 ; 1 CPU clock cycle
brne loopi ; 2 (1) cycles when branch is taken (not
taken)
dec r21 ; 1 cycle
brne loop0 ; 2 (1) cycles when branch is taken (not
taken)
dec r17
brne loop1
02/25/2025 14
Creating Longer Delay
Use multi-layer program loops
An instruction sequence that create a delay of 1 s is as follows:
ldi r18,20
loop2: ldi r17, 200
loop1: ldi r21, 250
loop0: ldi r20, 4
loopi: push r0
pop r0
dec r20
brne loopi
dec r21
brne loop0
dec r17
brne loop1
dec r18
brne loop2
02/25/2025 15
Stack Data Structure
Element can only be accessed from its top.
Add a new element to the stack by pushing.
Removing an element from the stack by pulling (or popping).
Has a pointer that either points to the top element or the
location above the top element (for AVR MCU).
Setup the Stack Pointer
ldi r16, low(RAMEND)
out CPU_SPL, r16
ldi r16, high(RAMEND)
out CPU_SPH, r16
02/25/2025 16
Instructions for Stack Operation
pop rd ; SP [SP] + 1; rd [SP]
push rd ; mem([SP]) [rd]; SP [SP] – 1,
Use of Stack
Store return address of subroutine call and interrupt service.
Temporary storage
Store local variables for subroutine execution
Holding place for return values for subroutine call
02/25/2025 17
A Simple Subroutine
; --------------------------------------------------------------------------------------------
; This subroutine swaps the contents of r16 & r17
; --------------------------------------------------------------------------------------------
swapRegs: push r16
mov r16, r17
pop r17
ret
02/25/2025 18
Subroutine to Generate a Delay of 250 ms
delay250us:
ldi r21, 250
loopo: ldi r20, 4
loopi: push r0
pop r0
dec r20
brne loopi
dec r21
brne loopo
ret
02/25/2025 19
delayby250us:
ldi r21, 250
loopo: ldi r20, 4
loopi: push r0
pop r0
dec r20
brne loopi
dec r21
brne loopo
dec r16
brne delayby250us
ret
How to Call (to create 50-ms delay)
ldi r16, 200
call delayby250us
02/25/2025 20
delayby50ms:
ldi r17, 200
loop3: ldi r21, 250
loop2: ldi r20, 4
loop1: push r0
pop r0
dec r20
brne loop1
dec r21
brne loop2
dec r17
brne loop3
dec r16
brne delayby50ms
ret
02/25/2025 21
Issues Related to Subroutine Call
Parameter passing
Local variable allocation and de-allocation
Result returning
Parameter Passing
Using CPU registers (r0~r31)
Using stack
Using global memory
02/25/2025 22
Local Variable Allocation & Deallocation
Temporary variables and results are needed for the execution
of a subroutine.
Temporary variables and results are useful only during the
execution of the subroutine.
Should be allocated in stack for easy allocation and
deallocation.
When allocated in stack, a subroutine can be made into re-
entrant (can call itself).
Best allocated in CPU registers for AVR—why?
02/25/2025 23
Allocating k Bytes in Stack for Local Variables
in YL, CPU_SPL ; transfer SP to Y
in YH, CPU_SPH ; “
sbiw YL, k
out CPU_SPL, YL
out CPU_SPH, YH
Made into a Macro How to Call?
.macro allocStk allocStk k
in YL, CPU_SPL
in YH, CPU_SPH
sbiw YL, @0
out CPU_SPL, YL
out CPU_SPH, Yh
.endmacro
02/25/2025 24
Macro for Deallocating Local Variables in Stack
.macro deallocStk
in YL, CPU_SPL
in YH, CPU_SPH
adiw YL, @0
out CPU_SPL, YL
out CPU_SPH, YH
.endmacro
How to Invoke?
deallocStk k ; call to deallocate k bytes in stack
02/25/2025 25
AVR Stack Frame
02/25/2025 26
How to Return Results?
Returned in registers, stack, or global memory
Best returned in registers
If results are to returned in stack, the stack slot to hold the
result should be allocated by the caller.
02/25/2025 27
Accessing Local Variables in Stack
02/25/2025 28
Register Usage Convention
Both the subroutine and the caller are required to use registers,
interference might exist.
Interference must be avoided to ensure the correct execution of the
program.
Table 5.1 Recommendation for register usage
A recommendation for the use of AVR registers is given in Table 5.1.
Name Usage
r4~r7, r12~r15, r28, r29 Callee saved
r0~r3, r8~r11, r20, r21 Caller saved
r16~r19, r30, r31 Parameter passing
r22~r27 Result returning
02/25/2025 29
Instructions for Making Subroutine Call
02/25/2025 30
A Few Examples of Subroutines
delay50ms: delayby50ms:
ldi r19, 200 ldi r19, 200
loop3: ldi r21, 250 loop3: ldi r21, 250
loop2: ldi r20, 4 loop2: ldi r20, 4
loop1: push r0 loop1: push r0
pop r0 pop r0
dec r20 dec r20
brne loop1 brne loop1
dec r21 dec r21
brne loop2 brne loop2
dec r19 dec r19
brne loop3 brne loop3
ret dec r16
brne delayby50ms
ret
ldi r16, 20
call delayby50ms ; create 1 s
02/25/2025 31
delay
Subroutine to Multiply two 16-bit Unsigned Integers
P and Q are two 16-bit unsigned integers
P = PHPL = PH x 28 + PL
Q = QHQL = QH x 28 + QL
02/25/2025 32
msb lsb
partial product
PLQL
partial product
PHQL
partial product
PLQH
partial product
+ PHQH
02/25/2025 33
Subroutine to Multiply two 16-bit Unsigned Integers
Incoming arguments: r16:r17 & r18:r19
Result returned in: r22~r25 (lsb to msb)
;
-----------------------------------------------------------------------------------------------------------------
-------------
; The first and second numbers are passed in r17:r16 & r19:r18.
; The product is returned in r25..r22.
;
-----------------------------------------------------------------------------------------------------------------
-------------
.def p1 = r22 ; lsb of the product
.def p2 = r23
.def p3 = r24
.def p4 = r25 ; msb of the product
.def PH = r17 ; high byte of multiplicand
.def PL = r16 ; low byte of multiplicand
02/25/2025 .def QH = r19 ; high byte of multiplier 34
mul16U: mul PL, QL
movw p1, r0 ; (p2:p1) <-- QL x PL
mul PH, QH
movw p3, r0 ; (p4:p3) <-- QH x PH
mul PL, QH ; compute PL x QH
add p2, r0 ; add partial product to p3:p2
adc p3, r1 ; “
clr zero ; add carry to p4
adc p4, zero ; “
mul PH, QL ; compute PH x QL
add p2, r0 ; add partial product to p3:p2
adc p3, r1 ; “
adc p4, zero ; add carry to p4
ret
02/25/2025 35
Write an instruction sequence to multiply the 16-bit numbers stored
at data memoy 0x2000~0x2001 & 0x2010~0x2011 and store the
product at data memory 0x2020~0x2023.
Solution:
lds r16, 0x2000
lds r17, 0x2001
lds r18, 0x2010
lds r19, 0x2011
call mul16U
sts 0x2020, r22
sts 0x2021, r23
sts 0x2022, r24
sts 0x2023, r25
02/25/2025 36
Algorithm for Multiplying two Signed 16-bit Numbers
Step 1
Multiply two operands disregarding the sign.
Step 2
If op1 is negative, subtract op2 from the upper half of the
product.
Step 3
If op2 is negative, subtract op1 from the upper half of the
product.
02/25/2025 37
Signed 16-bit Multiplication Subroutine
Incoming argument
Multiplicand: r16: r17
Multiplier: r18: r19
Result
Product: r22~r25
02/25/2025 38
mul16s: call mul16U
sbrs r17, 7 ; check the sign of the first number
rjmp chk2 ; first number is positive, check 2nd number
sub r24, r18 ; subtract the second number from upper half of
sbc r25, r19 ; product
chk2: sbrs r19, 7 ; check the sign of the second number
rjmp doneMs ; second number is positive, prepare return
sub r24, r16 ; subtract the first number from upper half of
sbc r25, r17 ; product
doneMs: ret
.include “mul16U.asm”
02/25/2025 39
Writing Subroutine to Perform Unsigned Division
Shift-and-subtract method is often used to carry the division.
msb
R register Q register
lsb
Set LSB
C
Controller
ALU
P register
02/25/2025 40
16-bit Unsigned Division Algorithm
Step 1
Load divisor, dividend, and 0, into P, Q, and R registers. lpCnt
16.
Step 2
Shift R:Q to the left one place.
Step 3
Subtract P from R and place the difference back to R if the
difference is non-negative.
Step 4
Set the bit 0 of Q to 1 if the difference computed in Step 3 is non-
negative. Otherwise, set it to 0.
Step 5
lpCnt lpCnt – 1; if (lpCnt > 0) go to Step 2; else Stop.
02/25/2025 41
;
--------------------------------------------------------------------------------------------
----------------------
; Dividend and divisor are passed in r17:r16 & r19:r18,
respectively.
; Quotient is returned in r25:r24; remainder is returned in
r23:r22.
;
--------------------------------------------------------------------------------------------
----------------------
.def lpCnt = r20
.def RL = r22
.def RH = r23
.def QL = r24
.def QH = r25
.def PL = r18
02/25/2025 42
.def PH = r19
div16U: ldi lpCnt, 16 ; set up loop count
movw QL, r16 ; load dividend into Q register
clr RL ; initialize R register to 0
clr RH ; “
dvloop: lsl QL ; shift R:Q to left one place
rol QH ; “
rol RL ; “
rol RH ; “
cp RL, PL ; transfer RL:RH to tmpL:tmpH
cpc RH, PH
brlo nxtb ; perform unsigned comparison
ori QL, 0x01 ; set bit 0 of Q to 1
sub RL, PL ; put difference in R
sbc RH, PH ; “
nxtb: dec lpCnt
brne dvloop
ret
02/25/2025 43
Example. Write a program to find all elements in an array of 16-bit numbers divisible
by 5 and save them in data memory starting from 0x2000. The array has 30 elements.
.include <atxmega128A1def.inc>
.def lpCnt = r20
.dseg
.org 0x2000
result: .byte 20
.cseg
.org 0x00
jmp start
.org 0xF6
start: ldi r16, low(RAMEND)
out CPU_SPL, r16
ldi r16, high(RAMEND)
out CPU_SPH, r16
call setCPUClkto32Mwith32MIntOsc
ldi ZL, low(array<<1)
ldi ZH, high(array<<1)
ldi YL, low(result)
ldi YH, high(result)
ldi lpCnt, 30
02/25/2025 44
loop: lpm r16, Z+
lpm r17, Z+
ldi r18, 5
ldi r19, 0
call div16U
cpi r22, 0 ; is remainder equals 0?
brne false
st Y+, r16
st Y+, r17
false: dec lpCnt
brne loop
here: jmp here
.include “sysclk_xmega.asm”
.include “div16U.asm”
array: .dw 1234, 2345, 3456, 4567, 5678
.dw 1122, 2233, 3344, 4455, 5566
02/25/2025 45
Convert an Internal Binary Number into a BCD String
xx: number to be converted
quo: quotient of division
rem: remainder of a division
ptr: pointer to the buffer to store the resultant string
Step 1
Push 0 into the stack.
Step 2
Quo xx / 10; rem xx mod 10;
Step 3
Stack rem + 0x30;
Step 4
If quo 0, xx quo and go to Step 2; else, next step.
Step 5
Pop the string out of the stack and store it in the buffer.
02/25/2025 46
Example 6.4 Write a subroutine that can convert a 16-bit binary
number held in r16:r17 into an ASCII BCD string and stores the string
in a buffer pointed by the Z pointer.
Solution:
Parameter Passed:
16-bit number to be converted: in r16:r17
Pointer to buffer to hold the converted string: in Z
02/25/2025 47
.def sign = r12
.equ NULL =0
bin2BCD:clr sign
push sign
sbrs r17,7 ; check the sign of the number
rjmp normal
inc sign ; indicate sign is negative
com r16 ; find the magnitude of the given number
com r17 ; "
movw r24,r16 ; "
adiw r24,1 ; "
movw r16,r24 ; "
normal: ldi r18,10 ; set divisor to 10
clr r19 ; “
call div16U
ldi r26,0x30
add r22,r26 ; convert remainder digit to ASCII code
02/25/2025 48
push r22
sbiw r24,0 ; is quotient 0?
breq popStk ; quotient is 0, pop string out of stack
movw r16, r24; quotient is not 0, continue to divide
rjmp normal ; loop
popStk: sbrs sign,0 ; check the sign of the number
rjmp popLp ; do nothing if positive
ldi r17,'-' ; push a minus sign into stack
push r17
popLp: pop r17
cpi r17,NULL ; is it a NULL character?
breq exit
st Z+,r1
rjmp popLp
exit: st z,r17 ; terminate the string with a NULL character
ret
02/25/2025 49
Example 6.5 Write a program to find all 4-digit decimal numbers
that have the following property:
The sum of the square of the upper half and the square of the lower half of the
given number equals to the original number.
.include <atxmega128A1def.inc>
.def kL = r4 ; test number (start from 1000)
.def kH = r5 ; “
.def TOPL = r6 ; register to hold upper bound
.def TOPH = r7 ; (10000)
.dseg
.org 0x2000
buf: .byte 20
.cseg
.org 0x00
jmp start
.org 0xF6
02/25/2025 50
start: ldi r16,low(RAMEND) ; set up stack pointer
out CPU_SPL,r16 ; “
ldi r17,high(RAMEND) ; “
out CPU_SPH,r17 ; “
call setCPUClkto32Mwith32MIntOsc
ldi r16,low(1000)
ldi r17,high(1000)
movw kL,r16 ; transfer 1000 to kL:kH
ldi r16,low(10000)
ldi r17,high(10000)
movw TOPL,r16 ; place 10000 in TOPL:TOPH
ldi ZL,0 ; use Z as buffer pointer
ldi ZH,0x20 ; "
floop: movw r16,kL ; pass number to be tested in r16:r17
call test
cpi r22,1 ; does test subroutine return 1?
brne next ; no, check next numbers
st Z+,kL ; save the number
st Z+,kH ; "
02/25/2025 51
next: movw r28,kL ; increment kL:kH by 1
adiw r28,1 ; "
movw kL,r28 ; "
cp kL,TOPL
brne floop
cp kH,TOPH
brne floop
done: jmp done
02/25/2025 52
test: ldi r18,100
clr r19
call div16U
mul r22, r22 ; compute rem * rem
movw r22, r0 ; transfer back to r22:r23
mul r24, r24 ; compute quo * quo
add r0, r22 ; compute rem*rem + quo*quo
adc r1, r23 ; “
cp r0, r16 ; compare with r16:r17
brne false ; branch if low bytes are not equal
cp r1, r17 ; compare high bytes
brne false
ldi r22, 1 ; returns 1 if the given number has the property
ret
false: clr r22
ret
.include "div16U.asm"
.include "sysClock_xmega.asm"
02/25/2025 53
Finding the Square Root (Successive Approximation Method)
SAR: successive approximation register
mask: mask to set a bit in an 8-bit register to 1
lpCnt: loop count
Temp: temporary variable
02/25/2025 54
Start
SAR[n - 1, ..., 0] 0
i n-1
SAR[i] 1
i i-1 yes
SAR * SAR > num? SAR[i] 0
no
no
i = 0?
yes
Stop
02/25/2025 55
Algorithm
Step 1
sar 0; mask 0x80; lpCnt 8; temp 0
Step 2
temp sar OR mask // guess bit i is 1
Step 3
If ((temp * temp) num) sar temp;
Step 4
Mask mask SRL 1 (shift right logically one place);
Step 5
lpCnt lpCnt – 1
Step 6
If (lpCnt == 0) stop; else go to Step 2.
02/25/2025 56
Drawback of the Successive Approximation Algorithm
Too pessimistic—the square root tend to be too small
The better approximation may be [SAR] + 1 instead of [SAR]
Need to compare ([SAR]+1)2 – num and num – [SAR]2.
Example 6.6 Write a subroutine that can find the square root of a
16-bit number. The 16-bit number of which the square root is to be
found is passed in r16: r17 and the square root is returned in r22.
02/25/2025 57
.def mask = r20
.def sar = r22
.def tmp = r1
.def lpcnt = r21
.def qL = r16
.def qH = r17
SqRoot16: ldi mask, 0x80
clr sar
ldi lpcnt, 8
sqLoop1: mov tmp, sar ; make a guess of a sar bit
or tmp, mask ; "
mul tmp, tmp ; compute sar * sar
cp qL, r0 ; compare sar * sar with q
cpc qH, r1 ; "
brlo nextb
or sar, mask ; keep the guess
nextb: lsr mask
dec lpcnt
brne sqLoop1
02/25/2025 58
;
-----------------------------------------------------------------------------------------------------------------
------------------
; Find out whether sar or sar+1 is closer to the true square root by comparing
; whether q – sar*sar or (sar+1)2 – q is smaller.
;
-----------------------------------------------------------------------------------------------------------------
------------------
mul sar, sar ; compute D1 = q - sar * sar
movw r8, qL ; "
sub r8, r0 ; "
sbc r9, r1 ; "
mov r0, sar ; compute D2 = (sar+1)*(sar+1) - q
inc r0 ; "
mul r0, r0 ; "
sub r0, qL ; "
sbc r1, qH ; "
cp r8, r0 ; compare D1 with D2
cpc r9, r1 ; "
brlo selSar ; choose sar if D1 < D2
02/25/2025 59
inc sar
Example 6.7 Write a program to find the square root of an array of 16-bit
numbers.
Solution:
.include <atxmega128A1Udef.inc>
.def llCnt = r25
.cseg
.org 0x00
jmp start
.org 0xF6
start: ldi r16, low(RAMEND)
out CPU_SPL, r16
ldi r16, high(RAMEND)
out CPU_SPH, r16
call setCPUClkto32Mwith32MIntOsc
ldi llCnt, 12
ldi YL,0 ; Y points to SRAM
ldi YH,0x20 ; “
ldi ZL,low(array<<1) ; Z points to array
ldi ZH,high(array<<1) ; “
02/25/2025 60
mloop: lpm r16,Z+ ; fetch the next 16-bit number
lpm r17,Z+ ; “
call SqRoot16
st Y+, r22
dec llCnt
brne mloop
again: jmp again
array: .dw 1234,2345,3456,4567,5678,6789
.dw 1601,2026,2506,3509,3600,5000
02/25/2025 61
Bubble Sort
Go through the array as many iterations as one less than the
number of array or file elements.
In each iteration, compare each adjacent pair of elements
from the lowest toward the highest array index. Swap the
adjacent elements if they are not in order.
The sorting efficiency can be improved by keeping track of
whether swapping has been done in an iteration. If no
swapping has been done, the sorting algorithm should stop.
02/25/2025 62
iteration ¬ N - 1
sorted ¬ 1
inner ¬ iteration
i¬ 0
yes
swap array[i] & array[i+1]
sorted ¬ 0
inner ¬ inner - 1
i¬ i+1
no
inner = 0?
yes
yes
sorted = 1?
no
Iteration ¬ iteration - 1
no
iteration = 0?
yes
Stop
SP
3 2 1
sorted
iCnt
eCnt
r28
r29
ret_addr
Figure 5.10 Stack frame for bubble sort subroutine
02/25/2025 64
.include <atxmega128A1Udef.inc>
.equ SPL = CPU_SPL ; commented for MEGA devices
.equ SPH = CPU_SPH ; "
.macro allocStk ; this macro allocate space for local variables
in r28,SPL
in r29,SPH
sbiw r28,@0
out SPL,r28
out SPH,r29
.endmacro
.macro deallocStk ; this macro deallocates space used by local
variables
in r28,SPL
in r29,SPH
adiw r28,@0
out SPL,r28
out SPH,r29
.endmacro
02/25/2025 65
.equ NN = 30
.def lpcnt = r21
.dseg
.org 0x2000
array: .byte 40
.cseg
.org 0x00
rjmp start
.org 0xF6
start: ldi r20,low(RAMEND) ; initialize stack pointer
out SPL,r20 ; "
ldi r20,high(RAMEND) ; "
out SPH,r20 ; "
ldi ZL,low(xarr<<1) ; set up pointer to the array in program memory
ldi ZH,high(xarr<<1) ; "
ldi XL,low(array) ; set up pointer to the buffer array in data
memory
ldi XH,high(array) ; "
ldi lpcnt,NN ; set up loop count
02/25/2025 66
cLoop: lpm r0,z+ ; copy the array from program memory to data
memory
st x+,r0 ; so that it can be sorted
dec lpcnt ;"
brne cLoop ;"
ldi r16,low(array) ; pass array pointer
ldi r17,high(array) ; "
ldi r18,NN ; pass array count
call bubble
again: rjmp again
;
---------------------------------------------------------------------------------------------------------------
---------------------------
; The next subroutine uses bubble sort algorithm to sort an array in data
memory.
; The array count is passed in r18 and the pointer to the array is passed in
r16~r17.
; All array elements are nonnegative.
;
02/25/2025 67
---------------------------------------------------------------------------------------------------------------
bubble: push YH
push YL
allocStk 3 ; allocate 3 bytes for local variables
in YL, SPL ; set Y point to the byte above the top of stack
in YH, SPH ; "
dec r18 ; initialize iteration count to NN - 1
std Y+eCnt, r18 ; "
eLoop: ldd r18,Y+eCnt ; set up inner loop count
std Y+iCnt, r18 ; "
movw ZL, r16 ; place array base address in Z
ldi r20, 1 ; set flag to indicate array is sorted
std Y+sorted, r20 ; "
iloop: ld r8, z ; fetch element array[i]
ldd r9, z+1 ; fetch element array[i+1]
cp r8, r9 ; compare array[i] with array[i+1]
brlo next
02/25/2025 68
st Z, r9 ; swap array[i] with array[i+1]
std Z+1, r8 ; "
clr r20 ; indicate array not sorted
std Y+sorted, r20 ; "
next: adiw ZL, 1 ; increment array pointer by 1
ldd r20,Y+iCnt ; decrement inner loop count
dec r20 ; "
std Y+iCnt, r20 ; "
brne iloop ; continue if inner loop count is not 0
; at the end of an iteration
ldd r20,Y+sorted ; check array sorted flag
cpi r20,true ; "
breq done ; stop if sorted flag is true (1)
ldd r20,Y+eCnt ; decrement iteration loop count
dec r20 ; "
std Y+eCnt,r20 ; "
brne eLoop
02/25/2025 69
done: deallocStk 3
pop YL
pop YH
ret
xarr: .db 12,91,20,33,45,72,24,19,17,101
.db 11,92,21,34,44,71,25,18,16,131
.db 41,43,49,50,99,79,89,98,37,59
// End of program
02/25/2025 70