Assembler: Jian-hua Yeh (葉建華) 真理大學資訊科學系助理教授
Assembler: Jian-hua Yeh (葉建華) 真理大學資訊科學系助理教授
Source Object
Assembler Linker
Program Code
Executable
Code
Loader
2
Chapter 2 -- Outline
• Basic Assembler Functions
• Machine-dependent Assembler Features
• Machine-independent Assembler Features
• Assembler Design Options
3
Introduction to Assemblers
• Fundamental functions
– translating mnemonic operation codes to their machine
language equivalents
– assigning machine addresses to symbolic labels
• Machine dependency
– different machine instruction formats and codes
4
Example Program (Fig. 2.1)
• Purpose
– reads records from input device (code F1)
– copies them to output device (code 05)
– at the end of the file, writes EOF on the output device, then
RSUB to the operating system
5
Example Program (Fig. 2.1)
• Data transfer (RD, WD)
– a buffer is used to store record
– buffering is necessary for different I/O rates
– the end of each record is marked with a null character (0016)
– the end of the file is indicated by a zero-length record
6
Assembler Directives
• Pseudo-Instructions
– Not translated into machine instructions
– Providing information to the assembler
7
Object Program
• Header
Col. 1 H
Col. 2~7 Program name
Col. 8~13 Starting address (hex)
Col. 14-19 Length of object program in bytes (hex)
• Text
Col.1 T
Col.2~7 Starting address in this record (hex)
Col. 8~9 Length of object code in this record in bytes (hex)
Col. 10~69 Object code (69-10+1)/6=10 instructions
• End
Col.1 E
Col.2~7 Address of first executable instruction (hex)
8
(END program_name)
Fig. 2.3
H COPY 001000 00107A
T 001000 1E 141033 482039 001036 281030 301015 482061 ...
T 00101E 15 0C1036 482061 081044 4C0000 454F46 000003 000000
T 002039 1E 041030 001030 E0205D 30203F D8205D 281030 …
T 002057 1C 101036 4C0000 F1 001000 041030 E02079 302064 …
T 002073 07 382064 4C0000 05
E 001000
9
Figure 2.1 (Pseudo code)
Program copy {
save return address;
cloop: call subroutine RDREC to read one record;
if length(record)=0 {
call subroutine WRREC to write EOF;
} else {
call subroutine WRREC to write one record;
goto cloop;
}
load return address
return to caller
}
10
An Example (Figure 2.1, Cont.)
EOR:
Subroutine RDREC { character x‘00’
clear A, X register to 0;
rloop: read character from input device to A register
if not EOR {
store character into buffer[X];
X++;
if X < maximum length
goto rloop;
}
store X to length(record);
return
}
11
An Example (Figure 2.1, Cont.)
Subroutine WDREC {
clear X register to 0;
wloop: get character from buffer[X]
write character from X to output device
X++;
if X < length(record)
goto wloop;
return
}
12
Assembler’s functions
13
Example of Instruction Assemble
• Forward reference
14
Difficulties: Forward Reference
• Forward reference: reference to a label that is
defined later in the program.
15
Two Pass Assembler
• Pass 1
– Assign addresses to all statements in the program
– Save the values assigned to all labels for use in Pass 2
– Perform some processing of assembler directives
• Pass 2
– Assemble instructions
– Generate data values defined by BYTE, WORD
– Perform processing of assembler directives not done in Pass 1
– Write the object program and the assembly listing
16
Two Pass Assembler
• Read from input line
– LABEL, OPCODE, OPERAND
Source
program
Intermediate Object
Pass 1 Pass 2
file codes
17
Data Structures
18
OPTAB (operation code table)
• Content
– menmonic, machine code (instruction format, length) etc.
• Characteristic
– static table
• Implementation
– array or hash table, easy for search
19
SYMTAB (symbol table)
• Content COPY 1000
– label name, value, flag, (type, length) etc.FIRST 1000
CLOOP 1003
• Characteristic ENDFIL 1015
EOF 1024
– dynamic table (insert, delete, search)
THREE 102D
• Implementation ZERO 1030
RETADR 1033
– hash table, non-random keys, hashing function
LENGTH 1036
BUFFER 1039
RDREC 2039
20
Assembler Design
• Machine Dependent Assembler Features
– instruction formats and addressing modes
– program relocation
• Machine Independent Assembler Features
– literals
– symbol-defining statements
– expressions
– program blocks
– control sections and program linking
21
Machine Dependent Assembler Features
22
Translation
• Register translation
– register name (A, X, L, B, S, T, F, PC, SW) and their values
(0,1, 2, 3, 4, 5, 6, 8, 9)
– preloaded in SYMTAB
• Address translation
– Most register-memory instructions use program counter
relative or base relative addressing
– Format 3: 12-bit address field
• base-relative: 0~4095
• pc-relative: -2048~2047
– Format 4: 20-bit address field 23
• pc-relative first
Relative Addressing Modes
• PC-relative
– e.g. 10 0000 FIRST STL RETADR 17202D
• displacement= RETADR - PC = 30-3 = 2D
– e.g. 40 0017 J CLOOP 3F2FEC
• displacement= CLOOP - PC = 6 - 1A = -14 = FEC
• Base-relative
– base register is under the control of the programmer
– e.g. 12 LDB #LENGTH
– e.g. 13 BASE LENGTH
– e.g. 160 104E STCH BUFFER, X 57C003
24
• displacement= BUFFER - B = 0036 - 0033 = 3
Address Translation
• Immediate addressing
– e.g. 55 0020 LDA #3 010003
– e.g. 133 103C +LDT #4096 75101000
– e.g. 12 0003 LDB #LENGTH 69202D
• the immediate operand is the symbol LENGTH
• the address of this symbol LENGTH is loaded into register B
• LENGTH=0033=PC+displacement=0006+02D
• if immediate mode is specified, the target address becomes the
operand
25
Address Translation (Cont.)
• Indirect addressing
– target addressing is computed as usual (PC-relative or BASE-
relative)
– only the n bit is set to 1
e.g. 70 002A J @RETADR 3E2003
26
Program Relocation
28
Relocatable Program
• Modification record
– Col 1 M
– Col 2-7 Starting location of the address field to be
modified, relative to the beginning of the program
– Col 8-9 length of the address field to be modified, in half-
bytes
29
Object Code
30
Machine-Independent Assembler
Features
• Literals
• Symbol Defining Statement
• Expressions
• Program Blocks
• Control Sections and Program Linking
31
Literals
• Design idea
– Let programmers to be able to write the value of a constant
operand as a part of the instruction that uses it.
– This avoids having to define the constant elsewhere in the
program and make up a label for it.
• Example
– e.g. 45 001A ENDFIL LDA =C’EOF’ 032010
– 93 LTORG
– 002D * =C’EOF’ 454F46
– e.g. 215 1062 WLOOP TD =X’05’ E32011
32
Literals vs. Immediate Operands
• Immediate Operands
– The operand value is assembled as part of the machine
instruction
– e.g. 55 0020 LDA #3 010003
• Literals
– The assembler generates the specified value as a constant
at some other memory location
– e.g. 45 001A ENDFIL LDA =C’EOF’ 032010
33
Literal - Implementation (1/3)
• Literal pools
– Normally literals are placed into a pool at the end of the
program
• see Fig. 2.10 (END statement)
– In some cases, it is desirable to place literals into a pool at
some other location in the object program
• assembler directive LTORG
• reason: keep the literal operand close to the instruction
34
Literal - Implementation (2/3)
• Duplicate literals
– e.g. 215 1062 WLOOP TD =X’05’
– e.g. 230 106B WD =X’05’
– The assemblers should recognize duplicate literals and store
only one copy of the specified data value
• Comparison of the defining expression
– Same literal name with different value, e.g. LOCCTR=*
• Comparison of the generated data value
– The benefits of using generate data value are usually not great
enough to justify the additional complexity in the assembler
35
Literal - Implementation (3/3)
• LITTAB
– literal name, the operand value and length, the address
assigned to the operand
• Pass 1
– build LITTAB with literal name, operand value and length,
leaving the address unassigned
– when LTORG statement is encountered, assign an address
to each literal not yet assigned an address
• Pass 2
– search LITTAB for each literal operand encountered
– generate data values using BYTE or WORD statements 36
• Defining symbols
– symbol EQU value
– value can be: constant, other symbol, expression
– making the source program easier to understand
– no forward reference
37
Symbol-Defining Statements
• Example 1
– MAXLEN EQU 4096
– +LDT #MAXLEN
+LDT #4096
• Example 2
– BASE EQU R1
– COUNT EQU R2
– INDEX EQU R3
• Example 3
– MAXLEN EQU BUFEND-BUFFER
38
ORG (origin)
• ORG
– indirectly assign values to symbols
– reset the location counter to the specified value
– ORG value
– value can be: constant, other symbol, expression
– no forward reference
41
SYMTAB
• None of the relative terms may enter into a multiplication or
division operation
• Errors:
– BUFEND+BUFFER
– 100-BUFFER
– 3*BUFFER
• The type of an expression
– keep track of the types of all symbols defined in the program
44
Program Blocks - Implementation
• Pass 1
– each program block has a separate location counter
– each label is assigned an address that is relative to the start
of the block that contains it
– at the end of Pass 1, the latest value of the location counter
for each block indicates the length of that block
– the assembler can then assign to each block a starting
address in the object program
• Pass 2
– The address of each symbol can be computed by adding the
assigned block starting address and the relative address of
the symbol to that block 45
Figure 2.12
• Each source line is given a relative address assigned
and a block number
• Object code
– It is not necessary to physically rearrange the generated
code in the object program
– see Fig. 2.13, Fig. 2.14
47
Control Sections and Program Linking
• Control sections
– can be loaded and relocated independently of the others
– are most often used for subroutines or other logical
subdivisions of a program
– secname CSECT
– the programmer can assemble, load, and manipulate each of
these control sections separately
– because of this, there should be some means for linking
control sections together
– example: instruction in one control section may need to refer
to instructions or data located in another section
– Fig. 2.15, 2.16
48
External Definition and References
• External definition
– EXTDEF name [, name]
– EXTDEF names symbols that are defined in this control section
and may be used by other sections
• External reference
– EXTREF name [,name]
– EXTREF names symbols that are used in this control section and
are defined elsewhere
• Example
– 15 0003 CLOOP +JSUB RDREC 4B100000
– 160 0017 +STCH BUFFER,X 5790000049
– 190 0028 MAXLEN WORD BUFEND-BUFFER 000000
Implementation
• The assembler must include information in the object program that will
cause the loader to insert proper values where they are required
• Define record
– Col. 1 D
– Col. 2-7 Name of external symbol defined in this control section
– Col. 8-13 Relative address within this control section (hexadeccimal)
– Col.14-73 Repeat information in Col. 2-13 for other external symbols
• Refer record
– Col. 1 D
– Col. 2-7 Name of external symbol referred to in this control section
– Col. 8-73 Name of other external reference symbols
50
Modification Record
• Modification record
– Col. 1 M
– Col. 2-7 Starting address of the field to be modified (hexiadecimal)
– Col. 8-9 Length of the field to be modified, in half-bytes
(hexadeccimal)
– Col.11-16 External symbol whose value is to be added to or
subtracted from the indicated field
– Note: control section name is automatically an external symbol, i.e.
it is available for use in Modification records.
• Example
– Figure 2.17
– M00000405+RDREC
– M00000705+COPY
51
External References in Expression
• Earlier definitions
– required all of the relative terms be paired in an expression
(an absolute expression), or that all except one be paired (a
relative expression)
• New restriction
– Both terms in each pair must be relative within the same
control section
– Ex: BUFEND-BUFFER
– Ex: RDREC-COPY
53
Two-Pass Assembler with Overlay Structure
54
One-Pass Assemblers
• Main problem
– forward references
• data items
• labels on instructions
• Solution
– data items: require all such areas be defined before they are
referenced
– labels on instructions: no good solution
55
One-Pass Assemblers
• Main Problem
– forward reference
• data items
• labels on instructions
56
Load-and-go Assembler
• Characteristics
– Useful for program development and testing
– Avoids the overhead of writing the object program out and
reading it back
– Both one-pass and two-pass assemblers can be designed
as load-and-go.
– However one-pass also avoids the over head of an
additional pass over the source program
– For a load-and-go assembler, the actual address must be
known at assembly time, we can use an absolute program
57
Forward Reference in One-pass Assembler
58
Load-and-go Assembler (Cont.)
• At the end of the program
– any SYMTAB entries that are still marked with * indicate
undefined symbols
– search SYMTAB for the symbol named in the END
statement and jump to this location to begin execution
59
Producing Object Code
• When external working-storage devices are not
available or too slow (for the intermediate file
between the two passes
• Solution:
– When definition of a symbol is encountered, the assembler
must generate another Tex record with the correct operand
address
– The loader is used to complete forward references that could
not be handled by the assembler
– The object program records must be kept in their original
order when they are presented to the loader
• Example
– Use link list to keep track of whose value depend on an
undefined symbol
• Figure 2.21
61
Implementation Examples
• Microsoft MASM Assembler
• Sun Sparc Assembler
• IBM AIX Assembler
62
Microsoft MASM Assembler
• SEGMENT
– a collection segments, each segment is defined as belonging
to a particular class, CODE, DATA, CONST, STACK
– registers: CS (code), SS (stack), DS (data), ES, FS, GS
– similar to program blocks in SIC
• ASSUME
– e.g. ASSUME ES:DATASEG2
MOVE ES,AX
63
Microsoft MASM Assembler
• JUMP with forward reference
– near jump: 2 or 3 bytes
– far jump: 5 bytes
– e.g. JMP TARGET
• PUBLIC, EXTRN
– similar to EXTDEF, EXTREF in SIC
64
Sun Sparc Assembler
• Sections
– .TEXT, .DATA, .RODATA, .BSS
• Symbols
– global vs. weak
– similar to the combination of EXTDEF and EXTREF in SIC
• Delayed branches
– delayed slots
– annulled branch instruction
65
Sun Sparc Assembler
LOOP: .
.
LOOP: .
ADD %L2, %L3, %L4
.
CMP %L0, 10
.
BLE LOOP
ADD %L2, %L3, %L4
.
CMP %L0, 10
. BLE LOOP
. NOP
CMP %L0, 10
BLE LOOP
ADD %L2, %L3, %L4 66
.
Sun Sparc Assembler
• Similar to System/370
• Base relative addressing
– save instruction space, no absolute address
– base register table:
• general purpose registers can be used as base register
– easy for program relocation
• only data whose values are to be actual address needs to be
modified
– e.g. USING LENGTH, 1
– USING BUFFER, 4
68
– Similar to BASE in SIC
AIX Assembler for PowerPC
• Alignment
– instruction (2)
– data: halfword operand (2), fullword operand (4)
– Slack bytes
• .CSECT
– control sections: RO(read-only data), RW(read-write data),
PR(executable instructions), BS(uninitialized read/write data)
– dummy section
69