ITSE 3242:
Systems Programming
Program execution and translation
Objective
Discuss Program execution steps.
Understand what different phases of the program
translation perform
Understand different types of object codes and how they
are related with program translation
Understand how linking works
Understand how loading works
Linking
Step 1: Take text segment from each .o file and put them together.
Step 2: Take data segment from each .o file, put them together, and
concatenate this onto end of text segments.
Step 3: Relocate and Resolve References
Go through un-resolved references and resolve them
Fill in all absolute addresses
.o file 1
a.out
text 1
data 1 Relocated text 1
info 1 Relocated text 2
Linker
Relocated data 1
.o file 2
Relocated data 2
text 2
data 2
info 2
Linker cont.…
.
Relocation
Relocation
Relocation
Relocation
Symbols Resolution
To resolve references:
search for reference (data or label) in all “user” symbol
tables
if not found, search library files (for example, for printf)
once absolute address is determined, fill in the machine
code appropriately
In the context of a linker, there are three different kinds of
symbols:
Global symbols:
That are defined by module m and that can be referenced by other
modules.
Global linker symbols correspond to:
non-static C functions and
global variables that are defined in the module.
Symbols Resolution
Global symbols (External references) that are
referenced by module m but defined by some
other module.
Such symbols are called externals
Local symbols that are defined and referenced
exclusively by module m.
C functions and global variables that are defined with the
static attribute.
Symbols
Symbol defined in your
#include <stdio.h> program and can be used
int errno; elsewhere
errno and x
int x = 27;
Symbols defined and used
static y = 53; only in this program.
y
int main () { main
int sum = 23; Symbol defined elsewhere
printf (“hello,world\n”); and used by your program
display(); printf
display
}
Symbol Types
Symbol definitions are stored (by the compiler) in a symbol
table.
Symbol table keeps track of symbols used in the program.
The compiler exports each global symbol as either strong
or weak
Strong symbols:
Functions
Initialized global variables
Weak symbols:
Uninitialized global variables
Symbol Types
Strong symbols:
Functions
Initialized global variables
Weak symbols:
Uninitialized global variables
Linker’s Symbol Rules
Rule 1: Multiple strong symbols with the same name are
not allowed in a single executable.
Each item can be defined only once
Otherwise: Linker error
Question: What will happen if the two programs are linked
together?
Linker’s Symbol Rules
Rule 2: Given a strong symbol and multiple weak symbols
with the same name, the linker chooses the strong symbol.
References to the weak symbol resolve to the strong symbol
Question: What will happen if the two programs are linked
together and the program is executed?
Linker’s Symbol Rules
Rule 2: Given a strong symbol and multiple weak symbol
with the same name, the linker chooses the strong symbol
References to the weak symbol resolve to the strong symbol
Question: What will happen if the two programs are linked
together and the program is executed?
Linker’s Symbol Rules
Rule 3: If there are multiple weak symbols with the same
name, the linker can pick an arbitrary one
Linking cont.….
Avoid global variables if you can otherwise
Use static if you can
Initialize if you define a global variable
Use extern if you use external global variable
Static Variables
In C, the keyword static affects the lifetime and linkage (visibility) of a variable
A static global variable, declared at the top of a source file, is visible only within the
source file.
Linker will not resolve any reference from another object file to it
Packaging common Libraries
How to package functions commonly used by programmers?
Like printf, scanf, strcmp.
Option 1: Put all functions in a single source file.
Programmers link big object file into their programs: but is very inefficient.
gcc -o myprog myprog.o somebiglibraryfile.o
Option 2: Put each routine in a separate object file.
Programmers explicitly link appropriate object files into their programs
but is a real pain to the programmer
gcc -o myprog myprog.o printf.o scanf.o strcmp.o .....
Packaging common Libraries
Solution: Static libraries
Combine multiple object files into a single archive file (file extension
“.a”) bundled together.
Linker can also take archive files as input: Linker searches the .o files
within the .a file for needed references and links them into the
executable.
gcc -o myprog myprog.o /usr/lib/libc.a
We can create a static library file using the UNIX ar command
ar rs libc.a atoi.o printf.o random.o ...
Packaging common Libraries
Commonly used static libraries
libc.a (the C standard library)
2.8 MB archive of 1400 object files.
I/O, memory allocation, signal handling, string handling, data and time,
Math libm.a (the C math library)
0.5 MB archive of 400 object files.
floating point math (sin, cos, tan, log, exp, sqrt, …)
Static libraries have the following disadvantages:
Lots of code duplication in the resulting executable files
Every C program needs the standard C library.
e.g., Every program calling printf() would have a copy of the printf()
code in the executable. Very wasteful!
OS would have to allocate memory for the standard C library routines being
used by every running program!
Any changes to system libraries would require relinking every binary!
Packaging common Libraries
Solution: Shared libraries
Libraries that are linked into an application dynamically,
They are Object files that contain code and data that are loaded and
linked into an application dynamically, at either load‐time or run‐time
On UNIX, “.so” filename extension is used
On Windows, “.dll” filename extension is used (dynamic link libraries)
When the OS runs a program, it checks whether the executable was
linked against any shared library (.so) files.
If so, it performs the linking and loading of the shared
libraries on the fly.
Example: gcc -o myprog main.o /usr/lib/libc.so
We can create our own shared libs using gcc -shared
gcc -shared -o mylib.so main.o swap.o
Dynamic Linking
Dynamic linking can occur when executable is first loaded
and run (load time linking)
Common case for Linux, handled automatically by the dynamic
linker (ld-linux.so)
Standard C library (libc.so) usually dynamically linked
Dynamic linking can also occur after program has begun
execution (run-time linking)
Shared library routines can be shared by multiple
processes.
Executable File Formats
The system has a format by which it expects the code and data of a
program to be laid out on disk, which we call an executable file format.
Each system has its own file format, but the major ones that have been
used are outlined here:
a.out (Assembler OUTput) — the oldest UNIX format, but did not have
adequate support for dynamic linking.
COFF (Common Object File Format) — An older Unix format that is no
longer used, but forms the basis for some other executable file formats
used today.
PE (Portable Executable) — The Windows executable format, which
includes a COFF section as well as extensions to deal with dynamic linking
and things like .net code.
ELF (Executable and Linkable Format) — The modern Unix/Linux format.
Mach-O — The Mac OSX format, based on the Mach research kernel
developed at CMU in the mid-1980s.
Loading Files
Input: Executable Code (e.g., a.out),
Output: (program is run)
Executable files are stored on disk. When one is run,
loader’s job is to load it into memory and start its running.
In reality, loader is the operating system (OS)
loading is one of the OS tasks
Loading Files
Functions of a loader
Reads executable file’s header to determine size of text and data
segments
Creates new address space for program large enough to hold text
and data segments, along with a stack segment
Copies instructions and data from executable file into the new
address space.
Copies arguments passed to the program onto the stack
Initializes machine registers
Jumps to start-up routine (usually main) that copies program’s
arguments from stack to registers & sets the PC
If main routine returns, start-up routine terminates program with
exit system call
Loading Files
Dynamic Loading
Routine is not loaded until it is called
Better memory-space utilization;
unused routines are never loaded.
Useful when large amounts of code are needed to handle
infrequently occurring cases.