Assembly Language
Assembly Language
Assembly language programs are tightly coupled with (and specific to) a target computer
architecture – as opposed to higher-level programming languages, which are generally
platform-independent. More sophisticated assemblers extend the basic translation of
program instructions with mechanisms to facilitate program development, control the
assembly process, and aid debugging.
Assembly language was once widely used for all aspects of programming, but today it
tends to be used more narrowly, primarily when direct hardware manipulation or unusual
performance issues are involved. Typical uses are device drivers, low-level embedded
systems, and real-time systems. These applications benefit from the increased speed of
processing assembly program instructions.
See the terminology section, below, regarding inconsistent use of the terms assembly and
assembler.
Contents
• 1 Key concepts
o 1.1 Assembler
o 1.2 Assembly language
• 2 Language design
• 3 Use of assembly language
o 3.1 Historical perspective
o 3.2 Current usage
o 3.3 Typical applications
• 4 Related terminology
• 5 Further details
• 6 Example listing of assembly language source code
• 7 See also
• 8 References
• 9 Books
• 10 External links
Key concepts
Assembler
Assemblers are generally simpler to write than compilers for high-level languages, and
have been available since the 1950s. Modern assemblers, especially for RISC based
architectures, such as MIPS, Sun SPARC and HP PA-RISC, optimize instruction
scheduling to exploit the CPU pipeline efficiently.
Note that, in normal professional usage, the term assembler is often used ambiguously: It
is frequently used to refer to an assembly language itself, rather than to the assembler
utility. Thus: "CP/CMS was written in S/360 assembler" as opposed to "ASM-H was a
widely-used S/370 assembler."
Assembly language
For example, an x86/IA-32 processor can execute the following binary instruction as
expressed in machine language:
• Binary: 10110000 01100001 (Hexadecimal: 0xb061)
• Move the hexadecimal value 61 (97 decimal) into the processor register named
"al".
The mnemonic "mov" is an operation code or opcode, and was chosen by the instruction
set designer to abbreviate "move." A comma-separated list of arguments or parameters
follows the opcode; this is a typical assembly language statement.
Every computer architecture has its own machine language. On this level, each
instruction is simple enough to be executed using a relatively few number of electronic
circuits. Computers differ by the number and type of operations they support. For
example a new 64-bit machine would have different circuitry than a 32-bit machine. They
may also have different sizes and numbers of registers, and different representations of
data types in storage. While most general-purpose computers are able to carry out
essentially the same functionality, the ways they do so differ; the corresponding assembly
languages reflect these differences.
Language design
Instructions (statements) in assembly language are generally very simple, unlike in a
high-level language. Each instruction typically consists of an operation or opcode (or,
simply, instruction) plus zero or more operands. Most instructions refer to a single value,
or pair of values. An instruction coded in the language usually corresponds directly to a
single executable machine language instruction.
• Data definitions. Additional directives let the programmer reserve storage areas
for reference by machine language statements. Storage can typically be initialized
with literal numbers, strings, and other primitive data types.
• Labels. Data definitions are referenced using names (labels or symbols) assigned
by the programmer, and typically reference constants, variables, or structure
elements. Labels can also be assigned to code locations, i.e. subroutine entry
points or GOTO destinations. Most assemblers provide flexible symbol
management, letting programmers manage different namespaces, automatically
calculate offsets within data structures, and assign labels that refer to literal values
or the result of simple computations performed by the assembler.
• Comments. Like most other computer languages, comments can be added to
assembly source code that are ignored by the assembler.
• Macros. A macro is a label that stands for some sequence of text lines. This
sequence of text lines may include a sequence of instructions. Upon finding the
macro label in a source code file when executing the program, it expands the
macro by replacing the macro label with the text lines that the macro label
represents. The name of the macro requires a small amount of typing to input, and
is then replaced by several lines of code which are assembled just as though they
had appeared in the source code file all along (Duntemann). Macros are also
supplied by a vendor or manufacturer to encapsulate a particular operation.
Such capabilities are borrowed from higher-level language designs. They can greatly
simplify the problems of coding and maintaining low-level code. Raw assembly source
code as generated by compilers or disassemblers – i.e. without comments, meaningful
symbols, or data definitions – is quite difficult to read.
Although there might be some unusual exceptions, most assembly languages share
the above basic characteristics:
There has been little apparent demand for more sophisticated assemblers since the decline
of large-scale assembly language development.
There has always been debate over the usefulness and performance of assembly language
relative to high-level languages, though this gets less attention today. Assembly language
has specific niche uses where it is important; see below. But in general, modern
optimizing compilers are claimed to render high-level languages into code that runs at
least as fast as hand-written assembly, despite some counter-examples that can be created.
The complexity of modern processors makes effective hand-optimization increasingly
difficult. Moreover, and to the dismay of efficiency lovers, increasing processor
performance has meant that most CPUs sit idle most of the time, with delays caused by
predictable bottlenecks such as I/O operations and paging. This has made raw code
execution speed a non-issue for most programmers (hence the increasing use of
interpreted languages without apparent performance impact).
There are really only a handful of situations where today's expert practitioners would
choose assembly language:
• When a stand-alone binary executable is required, i.e. one that must execute
without recourse to the run-time components or libraries associated with a high-
level language; this is perhaps the most common situation. These are embedded
programs that store only a small amount of memory and the device is intended to
do single purpose tasks. Such examples consist of telephones, automobile fuel and
ignition systems, air-conditioning control systems, security systems, and sensors.
• When interacting directly with the hardware, e.g., in a device driver. Examples
consist of sound and video cards, hard drives, modems, and printers. For example
printer manufacturers create a different driver for each model that they sell that
correlates with a specific operating system.
• When using processor-specific instructions not exploited by or available to the
compiler. A common example is the bitwise rotation instruction at the core of
many encryption algorithms.
• When extreme optimization is required, e.g., in an inner loop in a processor-
intensive algorithm. Game programmers are experts at writing code that takes
advantage of the capabilities of hardware features in systems enabling the games
to run faster.
• When a system with severe resource constraints (e.g., an embedded system) must
be hand-coded to maximize the use of limited resources; but this is becoming less
common as processor price/performance improves
• When no high-level language exists, e.g., on a new or specialized processor
• Real-time programs that need precise timing and responses, such as simulations,
flight navigation systems, and hospital equipment to monitor patients. For
example, if a human heart behaves irregularly, the machine must be notified
quickly and the proper response needs to be initialized quickly so the patient does
not die. Assembly is still faster than code that must be translated to allow fast
response.
Few programmers today need to use assembly language on a daily basis. For most
applications, a higher-level language like C, C++ is generally chosen.
Nevertheless, assembly language is still taught in most Computer Science and Electronic
Engineering programs. Although few programmers today regularly work with assembly
language as a tool, the underlying concepts remain very important. Such fundamental
topics as binary arithmetic, memory allocation, stack processing, character set encoding,
interrupt processing, and compiler design would be hard to study in detail without a grasp
of how a computer operates at the hardware level. Since a computer's behavior is
fundamentally defined by its instruction set, the logical way to learn such concepts is to
study an assembly language. Most modern computers have similar instruction sets.
Therefore, studying a single assembly language is sufficient to learn: i) The basic
concepts; ii) To recognize situations where the use of assembly language might be
appropriate; and iii) To see how efficient executable code can be created from high-level
languages.[2]
Typical applications
Hand-coded assembly language is typically used in a system's BIOS. This low-level code
is used, among other things, to initialize and test the system hardware prior to booting the
OS, and is stored in ROM. Once a certain level of hardware initialization has taken place,
execution transfers to other code, typically written in higher level languages; but the code
running immediately after power is applied is usually written in assembly language. The
same is true of most boot loaders.
Many compilers render high-level languages into assembly first before fully compiling,
allowing the assembly code to be viewed for debugging and optimization purposes.
Relatively low-level languages, such as C, often provide special syntax to embed
assembly language directly in the source code. Programs using such facilities, such as the
Linux kernel, can then construct abstractions utilizing different assembly language on
each hardware platform. The system's portable code can then utilize these processor-
specific components through a uniform interface.
Assembly language is also valuable in reverse engineering, since many programs are
distributed only in machine code form, and machine code is usually easy to translate into
assembly language and carefully examine in this form, but very difficult to translate into
a higher-level language. Tools such as the Interactive Disassembler make extensive use of
disassembly for such a purpose.
Related terminology
• Assembly language or assembler language is commonly called assembly,
assembler, ASM, or symbolic machine code. A generation of IBM mainframe
programmers called it BAL for Basic Assembly Language.
Note: Calling the language assembler is of course potentially confusing and
ambiguous, since this is also the name of the utility program that translates
assembly language statements into machine code. Some may regard this as
imprecision or error. However, this usage has been common among professionals
and in the literature for decades.[3] Similarly, some early computers called their
assembler its assembly program.[4])
• The use of the word assembly dates from the early years of computers (cf. short
code, speed code/"speedcoding").
• A cross assembler (see cross compiler) produces code for one type of processor,
but runs on another. This technology is particularly important when developing
software for new processors.
Further details
For any given personal computer, mainframe, embedded system, and game console, both
past and present, at least one--possibly dozens--of assemblers have been written. For
some examples, see the list of assemblers.
On Unix systems, the assembler is traditionally called as, although it is not a single body
of code, being typically written anew for each port. A number of Unix variants use GAS.
Within processor groups, each assembler has its own dialect. Sometimes, some
assemblers can read another assembler's dialect, for example, TASM can read old MASM
code, but not the reverse. FASM and NASM have similar syntax, but each support
different macros that could make them difficult to translate to each other. The basics are
all the same, but the advanced features will differ.
Also, assembly can sometimes be portable across different operating systems on the same
type of CPU. Calling conventions between operating systems often differ slightly or not
at all, and with care it is possible to gain some portability in assembly language, usually
by linking with a C library that does not change between operating systems. However, it
is not possible to link portably with C libraries that require the caller to use preprocessor
macros that may change between operating systems.
See also
• Little man computer - an educational computer model with a base-10 assembly
language
• x86 assembly language - the assembly language for common Intel 80x86
microprocessors
• Compiler
• Disassembler
• List of assemblers
• Instruction set
References
1. ^ David Salomon, Assemblers and Loaders. 1993 [1]
2. ^ Hyde, op. cit., Foreword ("Why would anyone learn this stuff?")
3. ^ Stroustrup, Bjarne, The C++ Programming Language, Addison-Wesley, 1986,
ISBN 0-201-12078-X: "C++ was primarily designed so that the author and his
friends would not have to program in assembler, C, or various modern high-level
languages. [use of the term assembler to mean assembly language]"
4. ^ Saxon, James, and Plette, William, Programming the IBM 1401, Prentice-Hall,
1962, LoC 62-20615. [use of the term assembly program]
5. ^ Murdocca, Miles J.; Vincent P. Heuring (2000). Principles of Computer
Architecture. Prentice-Hall. ISBN 0-201-43664-7.
6. ^ Principles of Computer Architecture (POCA) – ARCTools virtual computer
available for download to execute referenced code, accessed August 24, 2005
Books
• Programming from the Ground Up Online version of the introductory assembly
programming book.
• The Art of Assembly Language Programming, [2] by Randall Hyde
• Computer-Books.us, Online Assembly Language Books
• PC Assembly Language by Dr Paul Carter; *PC Assembly Tutorial using NASM
and GCC by Paul Carter
• Programming from the Ground Up by Jonathan Bartlett
• The x86 ASM Book by the ASM Community
• Dominic Sweetman: See MIPS Run. Morgan Kaufmann Publishers. ISBN 1-
55860-410-3
• Robert Britton: MIPS Assembly Language Programming. Prentice Hall. ISBN
0-13-142044-5
• John Waldron: Introduction to RISC Assembly Language Programming.
Addison Wesley. ISBN 0-201-39828-1
• Jeff Duntemann Assembly Language Step-by-Step
External links
• WinAsm Studio, The Assembly IDE - Free Downloads, Source Code, a free
Assembly IDE, a lot of open source programs to download and a popular Board
• The ASM Community, a great ASM programming resource including a
Messageboard and an ASM Wiki Book
• Intel Assembly 80x86 CodeTable (a cheat sheet reference)
• MenuetOS - hobby Operating System for the PC written entirely in 64bit
assembly language
• List of resources; books, websites, newsgroups, and IRC channels
• Unix Assembly Language Programming
• PPR: Learning Assembly Language
• CodeTeacher
• Assembly Language Programming Examples
• Typed Assembly Language (TAL)
• Authoring Windows Applications In Assembly Language
• RosAsm assembler/ RosAsm assembly Forum
• RosAsm Programming Examples
• 80x86 emulator
• AVR Assembler
• The Program Transformation Wiki
• GoAsm - a component of the free "Go" tools: 32-bit and 64-bit Windows
programming for x86 and AMD64/EM64T
• GNU lightning is a library that generates assembly language code at run-time
which is useful for Just-In-Time compilers
• "information on assembly programming under different platforms: IA32 (x86),
IA64 (Itanium), x86-64, SPARC, Alpha, or whatever platform we find
contributors for."
• "Terse: Algebraic Assembly Language for x86"
• Iczelion's Win32 Assembly Tutorial
• SB-Assembler for most 8-bit processors/controllers
• Assembly Tutorials BeginnersCode.com
• IBM z/Architecture Principles of Operation IBM manuals on mainframe machine
language and internals.
• IBM High Level Assembler IBM manuals on mainframe assembler language.
• Tools and tutorials for x86 programmers
• Assembly Optimization Tips by Mark Larson