Chapter 2 Assemblers
-- 2.3 Machine-Independent Assembler Features
Outline
Literals
Symbol Defining Statement
Expressions
Program Blocks
Control Sections and Program Linking
Literals
Consider the following example
FIVE
:
LDA
:
WORD
:
FIVE
5
It is convenient to write the value of a constant
operand as a part of instruction
:
LDA
:
=X05
Literals
A literal is identified with the prefix =, followed
by a specification of the literal value
Examples: (Figure 2.10, pp.68)
45
001A ENDFIL LDA
=CEOF
032010
nixbpe
000000 110010
93
disp
010
LTORG
002D *
=CEOF
454F46
215 1062 WLOOP
TD
=X05
E32011
230 106B
WD
=X05
DF2008
=X05
05
1076 *
Literals vs. Immediate Operands
Literals
The assembler generates the specified value as a
constant at some other memory location
45
001A
ENDFIL LDA
=CEOF
032010
Immediate Operands
The operand value is assembled as part of the machine
instruction
55
0020
LDA
#3
010003
We can have literals in SIC, but immediate
operand is only valid in SIC/XE.
Literal Pools
Normally literals are placed into a pool at the end of
the program
see Fig. 2.10 (after the END statement)
In some cases, it is desirable to place literals into a
pool at some other location in the object program
Assembler directive LTORG
When the assembler encounters a LTORG statement,
it generates a literal pool (containing all
literal operands used since previous LTORG)
Reason: keep the literal operand close to the instruction
Otherwise PC-relative addressing may not be
allowed
Duplicate literals
The same literal used more than once in the program
Only one copy of the specified value needs to be stored
For example, =X05 in Figure 2.10 (pp. 68)
How to recognize the duplicate literals
Compare the character strings defining them
Easier to implement, but has potential problem
(see next)
e.g. =X05
Compare the generated data value
Better, but will increase the complexity of the
assembler
e.g. =CEOF and =X454F46
Problem of duplicate-literal recognition
* denotes a literal refer to the current value of
program counter
BUFEND EQU * ( P.68 Fig. 2.10 )
There may be some literals that have the same name,
but different values
BASE *
LDB
=*
(cf. P.58 #LENGTH)
The literal =* repeatedly used in the program has the same name,
but different values
The literal =* represents an address in the
program, so the assembler must generate the
appropriate Modification records.
Literal table
LITTAB
Content
Literal name
Operand value and length
Address
LITTAB is often organized as a hash table, using the
literal name or value as the key
Implementation of Literals
Pass 1
Build LITTAB with literal name, operand value and length,
leaving the address unassigned
When LTORG or END statement is encountered, assign an address
to each literal not yet assigned an address
The location counter is updated to reflect the
number of bytes occupied by each literal
Pass 2
Search LITTAB for each literal operand encountered
Generate data values using BYTE or WORD statements
Generate Modification record for literals that represent an address
in the program
Example: (pp. 67, Figure 2.9)
SYMTAB & LITTAB
Name
COPY
FIRST
CLOOP
ENDFIL
RETADR
LENGTH
BUFFER
BUFEND
MAXLEN
RDREC
RLOOP
EXIT
INPUT
WREC
WLOOP
Value
0
0
6
1A
30
33
36
1036
1000
1036
1040
1056
105C
105D
1062
Literal
Hex
Value
Length
Address
CEOF
454F46
002D
X05
05
1076
Symbol-Defining Statements
Assembler directive EQU
Allows the programmer to define symbols and specify their values
Syntax: symbol EQU
value
To improve the program readability, avoid using magic numbers,
make it easier to find and change constant values
Replace
+LDT #4096
with
MAXLEN
+LDT
EQU
4096
#MAXLEN
Define mnemonic names for registers
A
X
EQU
EQU
0
1
RMO A,X
Expression is allowed
MAXLEN
EQU
BUFEND-BUFFER
Assembler directive ORG
Assembler directive ORG
Allow the assembler to reset the PC to values
Syntax:
ORG value
When ORG is encountered, the assembler resets its
LOCCTR to the specified value
ORG will affect the values of all labels defined until the
next ORG
If the previous value of LOCCTR can be automatically
remembered, we can return to the normal use of
LOCCTR by simply write
ORG
Example: using ORG
In the data structure
SYMBOL: 6 bytes
VALUE: 3 bytes (one word)
FLAGS: 2 bytes
We want to refer to every field of each entry
If EQU statements are used
STAB
SYMBOL
VALUE
FLAG
RESB 1100
EQU STAB
EQU STAB+6
EQU STAB+9
Example: using ORG
If ORG statements are used
STAB
RESB
ORG
RESB
RESW
RESB
ORG
SYMBOL
VALUE
FLAGS
1100
STAB
6
1
2
STAB+1100
Set LOCCTR to STAB
Size of each field
We can fetch the VALUE field by
LDA
VALUE,X
X = 0, 11, 22, for each entry
Restore LOCCTR
Forward-Reference Problem
Forward reference is not allowed for either
EQU or ORG.
All terms in the value field must have been defined
previously in the program.
The reason is that all symbols must have been defined
during Pass 1 in a two-pass assembler.
ALPHA
RESW
BETA
EQU
ALPHA
Not allowed: BETA
EQU
ALPHA
RESW
Allowed:
ALPHA
Expression
The assemblers allow the use of expressions as
operand
The assembler evaluates the expressions and produces a single
operand address or value
Expressions consist of
Operator
+,-,*,/ (division is usually defined to produce an
integer result)
Individual terms
Constants
User-defined symbols
Special terms, e.g., *, the current value of LOCCTR
Examples
MAXLEN
STAB
EQU
RESB
BUFEND-BUFFER
(6+3+2)*MAXENTRIES
Relocation Problem in Expressions
Values of terms can be
Absolute (independent of program location)
constants
Relative (to the beginning of the program)
Address labels
* (value of LOCCTR)
Expressions can be
Absolute
Only absolute terms
MAXLEN
EQU
1000
Relative terms in pairs with opposite signs for each
pair
MAXLEN
EQU
BUFEND-BUFFER
Relative
All the relative terms except one can be paired as
described in absolute. The remaining unpaired
relative term must have a positive sign.
STAB
EQU
OPTAB + (BUFEND BUFFER)
Restriction of Relative Expressions
No relative terms may enter into a
multiplication or division operation
3 * BUFFER
Expressions that do not meet the conditions
of either absolute or relative should be
flagged as errors.
BUFEND + BUFFER
100 - BUFFER
Handling Relative Symbols in SYMTAB
To determine the type of an expression, we must
keep track of the types of all symbols defined in
the program.
We need a flag in the SYMTAB for indication.
Symbol
Type
Value
RETADR
0030
BUFFER
0036
BUFEND
1036
MAXLEN
1000
BUFEND - BUFFER
!
BUFEND + BUFFER
100 - BUFFER
3 * BUFFER
Program Blocks
Allow the generated machine instructions and
data to appear in the object program in a
different order
Separating blocks for storing code, data, stack, and
larger data block
Program blocks v.s. Control sections
Program blocks
Segments of code that are rearranged
within a single object program unit
Control sections
Segments of code that are translated into
independent object program units
Program Blocks
Assembler directive: USE
USE
[blockname]
At the beginning, statements are assumed to be part of
the unnamed (default) block
If no USE statements are included, the entire program
belongs to this single block
Each program block may actually contain several
separate segments of the source program
Example: pp. 79, Figure 2.11
Program Blocks
Assembler rearrange these segments to gather
together the pieces of each block and assign address
Separate the program into blocks in a particular order
Large buffer area is moved to the end of the object program
Program readability is better if data areas are placed in the source
program close to the statements that reference them.
Example: pp, 81, Figure 2.12
Three blocks are used
"#
" $
default: executable instructions
CDATA: all data areas that are less in length
CBLKS: all data areas that consists of larger
blocks of memory
Example: pp. 81, Figure 2.12
(default) block
0000
0000
0003
0006
0009
000C
000F
0012
0015
0018
001B
001E
0021
0024
0000
0000
0003
0000
0000
1000
1000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
2
2
2
Block number
COPY
FIRST
CLOOP
ENDFIL
RETADR
LENGTH
BUFFER
BUFEND
MAXLEN
START
STL
JSUB
LDA
COMP
JEQ
JSUB
J
LDA
STA
LDA
STA
JSUB
J
USE
RESW
RESW
USE
RESB
EQU
EQU
#0
#RETADR
172063
#RDREC
4B2021
#LENGTH
032060
#0
290000
#ENDFIL
332006
#WRREC
4B203B
#CLOOP
3F2FEE
=CEOF
032055
#BUFFER
0F2056
#3
010003
#LENGTH
0F2048
#WRREC
4B2029
@RETADR
3E203F
#CDATA
CDATA block
#1
#1
#CBLKS
CBLKS block
#4096
#*
#BUFEND-BUFFER
Example: pp. 81, Figure 2.12
(default) block
0027
0027
0029
002B
002D
0031
0034
0037
003A
003C
003F
0042
0044
0047
004A
0006
0006
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
RDREC
RLOOP
EXIT
INPUT
+USE
+CLEAR
+CLEAR
+CLEAR
+LDT
+TD
+JEQ
+RD
+COMPR
+JEQ
+STCH
+TIXR
+JLT
+STX
+RSUB
+USE
+BYTE
+X
+A
+S
#MAXLEN
+INPUT
+RLOOP
+INPUT
+A,S
+EXIT
+BUFFER,X
+T
+RLOOP
+LENGTH
+CDATA
+XF1
B410
B400
B440
75101000
E32038
332FFA
DB2032
A004
332008
57A02F
B850
3B2FEA
13201F
4F0000
CDATA block
F1
Example: pp. 81, Figure 2.12
(default) block
004D
004D
004F
0052
0055
0058
005B
005E
0060
0063
0007
0
0
0
0
0
0
0
0
0
0
1
0007
000A
1
1
WRREC
WLOOP
*
*
=USE
=CLEAR
=LDT
=TD
=JEQ
=LDCH
=WD
=TIXR
=JLT
=RSUB
=USE
=LTORG
=CEOF
=X05
=END
=X
=LENGTH
=X05
=WLOOP
=BUFFER,X
=X05
=T
=WLOOP
=CDATA
B410
772017
E3201B
332FFA
53A016
DF2012
B850
3B2FEF
4F0000
CDATA block
454F46
05
=FIRST
Rearrange Codes into Program Blocks
Pass 1
A separate location counter for each program block
Save and restore LOCCTR when switching between
blocks
At the beginning of a block, LOCCTR is set to 0.
Assign each label an address relative to the start of the block
Store the block name or number in the SYMTAB along with the
assigned relative address of the label
Indicate the block length as the latest value of LOCCTR for each
block at the end of Pass1
Assign to each block a starting address in the object program by
concatenating the program blocks in a particular order
Block name
Block number
Address
Length
(default)
0000
0066
CDATA
0066
000B
CBLKS
0071
1000
Rearrange Codes into Program Blocks
Pass 2
Calculate the address for each symbol relative to the
start of the object program by adding
The location of the symbol relative to the
start of its block
The starting address of this block
Example of Address Calculation (P.81)
20 0006 0
LDA
LENGTH
032060
The value of the operand (LENGTH)
Address 0003 relative to Block 1 (CDATA)
Address 0003+0066=0069 relative to program
When this instruction is executed
PC = 0009
disp = 0069 0009 = 0060
op
nixbpe
disp
000000
110010
060
%
+,- .
&
'%
)
=> 032060
(()
*!
Program Blocks Loaded in Memory
(P.84 Fig. 2.14)
Line Source program
5 Default(1)
Object program
Program loaded
in memory
Default(1)
Default(1)
Default(2)
Default(2)
CDATA(2)
Default(3)
Relative
address
0000
0027
70
95 CDATA(1)
Not present
in object program
100
105 CBLKS(1)
125
Default(3)
Default(2)
CDATA(1)
CDATA(3)
180
185 CDATA(2)
004D
CDATA(2)
CDATA(3)
CBLKS(1)
0066
006C
006D
0071
210 Default(3)
245
253 CDATA(3)
1070
CDATA(2)
CDATA(3)
Default(3)
Default(2)
Default(1)
Object Program
It is not necessary to physically rearrange the
generated code in the object program
The assembler just simply insert the proper load address in
each Text record.
The loader will load these codes into correct place
Control Sections and Program Linking
Control sections
can be loaded and relocated independently of the other
control sections
are most often used for subroutines or other logical
subdivisions of a program
the programmer can assemble, load, and manipulate each
of these control sections separately
because of this, there should be some means for linking
control sections together
assembler directive: CSECT
secname
CSECT
separate location counter for each control section
External Definition and Reference
Instructions in one control section may need to refer
to instructions or data located in another section
External definition
EXTDEF
name [, name]
EXTDEF names symbols that are defined in this control section
and may be used by other sections
Ex: EXTDEF
BUFFER, BUFEND, LENGTH
External reference
EXTREF name [,name]
EXTREF names symbols that are used in this control section and
are defined elsewhere
Ex: EXTREF
RDREC, WRREC
To reference an external symbol, extended format
instruction is needed (why?)
Example: pp. 86, Figure 2.15
"/
Implicitly defined as an external symbol
@ &%
&@%
*0
"
/
+,#*
0+ #0
+,- .
1**+0
1*+,#
6 +,
9
0
9+6 #+*
9+6 0+*
9
9> 1
9 #
9" /
9>+?
9> 1
9>
9 #
9
9 #
9
9> 1
9>
90+ 7
90+ 7
9
090+
9+?1
9+?1
9
9 1**+0:1*+,#:+,- .
90#0+":
700+"
90+ #0
90#0+"
9 +,- .
;
9+,#*
9700+"
9"
/
4"<+ *<
9 1**+0
;
9 +,- .
9700+"
=0+ #0
9
9
9
9
9 1**+,#8 1**+0
" / * +*0
,/1
1 /1
2+0+ 10, ##0+
0+ # ,/1 0+" 0#
+ * 0+ *3+,- .4 5
+6
*+ ** 1,#
70 + 1 /1 0+" 0#
/
, +0 +,# ** + 0$+0
+ +,- .4
70 ++ *
0+ 10,
"
+0
+,- . *0+" 0#
8
+ 1**+0 0+
Example: pp. 86, Figure 2.15
0#0+"
Implicitly defined as an external symbol
&%(&%
&@%
" +"
1 0 1 ,+
+6
,/1
6 +,
9+6 0+*
9" + 0
9" + 0
9" + 0
9 #
9 #
9>+?
90#
9" /0
9>+?
9 ".
9 60
9>
9 6
90 1
9
+
97 0#
0+ #0+" 0# ,
1**+0
1**+0:+,- .:1**+,#
6
6 +,
,/1
0
/
,/1
:
+6
1**+0:
6
0
/
+,- .
6<* <
1**+,#8 1**+0
"+ 0
"+ 0
"+ 0
+
/" 1, +0
A+0
A+0
,/1 #+2"+
/1, 0+ #
0+ #". 0 " +0 , 0+- +0
+ * 0+,# *0+" 0#3
6< <5
+6
/ *+ 0
0+". 0 " +0 , 1**+0
/1, +
6 +,- ..
++,0+ ".+#
2+0+" 0# +,- .
0+ 10,
" +0
" #+* 0 ,/1 #+2"+
Example: pp. 86, Figure 2.15
Implicitly defined as an external symbol
B@(&%
&@%
700+"
)
)
" +"
1 0 1 ,+
9+6 0+*
9" + 0
9 #
9 #
9>+?
9 #".
97#
9 60
9>
90 1
9+,#
70 +0+" 0#*0
9 +,- .:1**+0
96
9 +,- .
46< <
97
/
9 1**+0:
6
46< <
9
97
/
9*0
1**+0
"+ 0
+
/" 1, +0
1 /1 #+2"+
/1, 0+ #
-+ ". 0 " +0*0
1**+0
70 +". 0 " +0
/1,
". 0 " +0 . 2+
++,70 +,
0+ 10,
" +0
External Reference Handling
Case 1 (P.87)
15
0003
CLOOP
+JSUB
RDREC
4B100000
The operand RDREC is an external reference.
The assembler
has no idea where RDREC is
inserts an address of zero
can only use extended format to provide
enough room (that is, relative addressing
for external reference is invalid)
The assembler generates information for each external
reference that will allow the loader to perform the
required linking.
External Reference Handling
Case 2
190 0028
MAXLEN
WORD
BUFEND-BUFFER
000000
There are two external references in the expression, BUFEND and
BUFFER.
The assembler
inserts a value of zero
passes information to the loader
Add to this data area the address of BUFEND
Subtract from this data area the address of BUFFER
Case 3
On line 107, BUFEND and BUFFER are defined in the same control
section and the expression can be calculated immediately.
107
1000
MAXLEN
EQU
BUFEND-BUFFER
Object Code of Figure 2.15
"/
*0
"
/
+,#*
#
0+ #0
+,- .
C
1**+0
1*+,#
6 +,
9
0
9+6 #+*
9+6 0+*
9
9> 1
9 #
9" /
9>+?
9> 1
9>
9 #
9
9 #
9
9> 1
9>
90+ 7
90+ 7
9
04"<+ *<
90+
9+?1
9+?1
9
9 1**+0:1**+,#:+,- .
90#0+":
700+"
90+ #0
90#0+"
9 +,- .
;
9+,#*
9700+"
9"
/
4"<+ *<
9 1**+0
;
9 +,- .
9700+"
=0+ #0
9
9
"
* *+"
*
*
+
*
9
9 1*+,#8 1**+0
Object Code of Figure 2.15
0#0+"
" +"
1 0 1 ,+
"
*
#
+6
,/1
6 +,
9+6 0+*
9" + 0
9" + 0
9" + 0
9 #
9 #
9>+?
90#
9" /0
9>+?
9 ".
9 60
9>
9 6
90 1
9
+
97 0#
0+ #0+" 0# ,
1**+0
1**+0:+,- .:1*+,#
6
6 +,
,/1
0
/
,/1
:
+6
1**+0:
6
*
+
**
#
*+
0
/
+,- .
6<* <
1**+,#8 1**+0
*
*
"
Object Code of Figure 2.15
700+"
)
)
7
"
" +"
1 0 1 ,+
9+6 0+*
9" + 0
9 #
9 #
9>+?
9 #".
97#
9 60
9>
90 1
9+,#
46
70 +0+" 0#*0
9 +,- .:1**+0
96
9 +,- .
46
97
/
9 1**+0:
6
46
9
97
/
1**+0
+
**
#*
*++
*
9*0
Records for Object Program
The assembler must include information in the object
program that will cause the loader to insert proper
values where they are required
Define record
(EXTDEF)
Col. 1
Col. 2-7
Col. 8-13
Col.14-73
D
Name of external symbol defined in this control section
Relative address within this control section (hexadeccimal)
Repeat information in Col. 2-13 for other external symbols
Refer record
(EXTREF)
Col. 1
R
Col. 2-7 Name of external symbol referred to in this control section
Col. 8-73 Name of other external reference symbols
Records for Object Program
Modification record
Col. 1
M
Col. 2-7 Starting address of the field to be modified (hexiadecimal)
Col. 8-9 Length of the field to be modified, in half-bytes (hexadeccimal)
Col.11-16 External symbol whose value is to be added to or subtracted
from the indicated field
Control section name is automatically an external symbol, i.e.
it is available for use in Modification records.
Object Program of Figure 2.15
"/
700+"
1*+,#8 1**+0
Object Program of Figure 2.15
0#0+"
Expressions in
Multiple Control Sections
Extended restriction
Both terms in each pair of an expression must be within the same
control section
Legal:
BUFEND-BUFFER
Illegal: RDREC-COPY
How to enforce this restriction
When an expression involves external references, the assembler
cannot determine whether or not the expression is legal.
The assembler evaluates all of the terms it can, combines these to
form an initial expression value, and generates Modification
records.
The loader checks the expression for errors and finishes the
evaluation.