100% found this document useful (1 vote)
825 views

Assembly Language

- Assembly language is a low-level programming language that uses mnemonics to represent machine code instructions. An assembler translates assembly language code into machine code. - Assembly language is tightly coupled with specific computer architectures and is used when direct hardware manipulation or performance is important. Common uses are device drivers, embedded systems, and real-time systems. - Assembly languages provide labels, macros, and other features to make programming easier while still generating efficient machine code instructions.

Uploaded by

magimittt
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
825 views

Assembly Language

- Assembly language is a low-level programming language that uses mnemonics to represent machine code instructions. An assembler translates assembly language code into machine code. - Assembly language is tightly coupled with specific computer architectures and is used when direct hardware manipulation or performance is important. Common uses are device drivers, embedded systems, and real-time systems. - Assembly languages provide labels, macros, and other features to make programming easier while still generating efficient machine code instructions.

Uploaded by

magimittt
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Assembly language

An assembly language is a low-level language used in the writing of computer


programs. Assembly language uses mnemonics, abbreviations or words that make it
easier to remember a complex instruction and make programming in assembler an easier
task. The goal of using mnemonics in the writing of assembly language programs is to
replace the more error prone, and time consuming, effort of directly programming in a
target computer's numeric machine code that had been used with the very first computers.
An assembly language program is translated into the target computer's machine code by a
utility program called an assembler. (An assembler is distinct from a compiler, in that it
generally performs one-to-one (isomorphic) translations from mnemonic statements into
machine instructions.) Translators that take an entire program and translate it as a body
are called compilers. Translators that take one line at a time are called interpreters.
Translators that translate assembly language programs into machine language are called
assemblers.

Assembly language programs are tightly coupled with (and specific to) a target computer
architecture – as opposed to higher-level programming languages, which are generally
platform-independent. More sophisticated assemblers extend the basic translation of
program instructions with mechanisms to facilitate program development, control the
assembly process, and aid debugging.

Assembly language was once widely used for all aspects of programming, but today it
tends to be used more narrowly, primarily when direct hardware manipulation or unusual
performance issues are involved. Typical uses are device drivers, low-level embedded
systems, and real-time systems. These applications benefit from the increased speed of
processing assembly program instructions.

See the terminology section, below, regarding inconsistent use of the terms assembly and
assembler.

Contents
• 1 Key concepts
o 1.1 Assembler
o 1.2 Assembly language
• 2 Language design
• 3 Use of assembly language
o 3.1 Historical perspective
o 3.2 Current usage
o 3.3 Typical applications
• 4 Related terminology
• 5 Further details
• 6 Example listing of assembly language source code
• 7 See also
• 8 References
• 9 Books

• 10 External links

Key concepts
Assembler

Typically a modern assembler creates object code by translating assembly instruction


mnemonics into opcodes, and by resolving symbolic names for memory locations and
other entities.[1] The use of symbolic references is a key feature of assemblers, saving
tedious calculations and manual address updates after program modifications. Most
assemblers also include macro facilities for performing textual substitution — e.g., to
generate common short sequences of instructions to run inline, instead of in a subroutine.

Assemblers are generally simpler to write than compilers for high-level languages, and
have been available since the 1950s. Modern assemblers, especially for RISC based
architectures, such as MIPS, Sun SPARC and HP PA-RISC, optimize instruction
scheduling to exploit the CPU pipeline efficiently.

More sophisticated high-level assemblers provide language abstractions such as:

• Advanced control structures


• High-level procedure/function declarations and invocations
• High-level abstract data types, including structures/records, unions, classes, and
sets
• Sophisticated macro processing

See Language design below for more details.

Note that, in normal professional usage, the term assembler is often used ambiguously: It
is frequently used to refer to an assembly language itself, rather than to the assembler
utility. Thus: "CP/CMS was written in S/360 assembler" as opposed to "ASM-H was a
widely-used S/370 assembler."

Assembly language

A program written in assembly language consists of a series of instructions mnemonics


that correspond to a stream of executable instructions, when translated by an assembler,
that can be loaded into memory and executed.

For example, an x86/IA-32 processor can execute the following binary instruction as
expressed in machine language:
• Binary: 10110000 01100001 (Hexadecimal: 0xb061)

The equivalent assembly language representation is easier to remember (more


mnemonic):

• mov al, 061h

This instruction means:

• Move the hexadecimal value 61 (97 decimal) into the processor register named
"al".

The mnemonic "mov" is an operation code or opcode, and was chosen by the instruction
set designer to abbreviate "move." A comma-separated list of arguments or parameters
follows the opcode; this is a typical assembly language statement.

Transforming assembly into machine language is accomplished by an assembler, and the


reverse by a disassembler. Unlike in high-level languages, there is usually a 1-to-1
correspondence between simple assembly statements and machine language instructions.
However, in some cases, an assembler may provide pseudoinstructions which expand
into several machine language instructions to provide commonly needed functionality.
For example, for a machine that lacks a "branch if greater or equal" instruction, an
assembler may provide a pseudoinstruction that expands to the machine's "set if less
than" and "branch if zero (on the result of the set instruction)". Most full-featured
assemblers also provide a rich macro language (discussed below) which is used by
vendors and programmers to generate more complex code and data sequences.

Every computer architecture has its own machine language. On this level, each
instruction is simple enough to be executed using a relatively few number of electronic
circuits. Computers differ by the number and type of operations they support. For
example a new 64-bit machine would have different circuitry than a 32-bit machine. They
may also have different sizes and numbers of registers, and different representations of
data types in storage. While most general-purpose computers are able to carry out
essentially the same functionality, the ways they do so differ; the corresponding assembly
languages reflect these differences.

Multiple sets of mnemonics or assembly-language syntax may exist for a single


instruction set, typically instantiated in different assembler programs. In these cases, the
most popular one is usually that supplied by the manufacturer and used in its
documentation.

Language design
Instructions (statements) in assembly language are generally very simple, unlike in a
high-level language. Each instruction typically consists of an operation or opcode (or,
simply, instruction) plus zero or more operands. Most instructions refer to a single value,
or pair of values. An instruction coded in the language usually corresponds directly to a
single executable machine language instruction.

The following consists of elements common to assembler programs:

• Data definitions. Additional directives let the programmer reserve storage areas
for reference by machine language statements. Storage can typically be initialized
with literal numbers, strings, and other primitive data types.
• Labels. Data definitions are referenced using names (labels or symbols) assigned
by the programmer, and typically reference constants, variables, or structure
elements. Labels can also be assigned to code locations, i.e. subroutine entry
points or GOTO destinations. Most assemblers provide flexible symbol
management, letting programmers manage different namespaces, automatically
calculate offsets within data structures, and assign labels that refer to literal values
or the result of simple computations performed by the assembler.
• Comments. Like most other computer languages, comments can be added to
assembly source code that are ignored by the assembler.
• Macros. A macro is a label that stands for some sequence of text lines. This
sequence of text lines may include a sequence of instructions. Upon finding the
macro label in a source code file when executing the program, it expands the
macro by replacing the macro label with the text lines that the macro label
represents. The name of the macro requires a small amount of typing to input, and
is then replaced by several lines of code which are assembled just as though they
had appeared in the source code file all along (Duntemann). Macros are also
supplied by a vendor or manufacturer to encapsulate a particular operation.

Such capabilities are borrowed from higher-level language designs. They can greatly
simplify the problems of coding and maintaining low-level code. Raw assembly source
code as generated by compilers or disassemblers – i.e. without comments, meaningful
symbols, or data definitions – is quite difficult to read.

Although there might be some unusual exceptions, most assembly languages share
the above basic characteristics:

• Some assemblers include quite sophisticated macro languages, incorporating


such high-level language elements as symbolic variables, conditionals, string
manipulation, and arithmetic operations, all usable during the execution of a given
macro, and allowing macros to save context or exchange information. Thus a
macro might emit a large number of assembly language instructions or data
definitions, based on the macro arguments. This could be used to generate record-
style data structures or "unrolled" loops, for example, or could generate entire
algorithms based on complex parameters. An organization using assembly
language that has been heavily extended using such a macro suite may arguably
be considered to be working in a (slightly) higher-level language – such
programmers are not working with a computer's lowest-level conceptual
elements.
• Some assemblers have incorporated structured programming elements to encode
execution flow. The earliest example of this approach was in the Concept-14
macro set developed by Marvin Zloof at IBM's Thomas Watson Research Center,
which extended the S/370 macro assembler with IF/ELSE/ENDIF and similar
control flow blocks. This was a way to reduce or eliminate the use of GOTO
operations in assembly code, one of the main factors causing spaghetti code in
assembly language. This approach was widely accepted in the early 80s (the latter
days of large-scale assembly language use).
• A curious design was A-natural, a "stream-oriented" assembler for 8080/Z80
processors from Whitesmiths Ltd. (developers of the Unix-like Idris and what was
reported to be the first commercial C compiler). The language was classified as an
assembler, because it worked with raw machine elements such as opcodes,
registers, and memory references; but it incorporated an expression syntax to
indicate execution order. Parentheses and other special symbols, along with block-
oriented structured programming constructs, controlled the sequence of the
generated instructions. A-natural was built as the object language of a C compiler,
rather than for hand-coding, but its logical syntax won some fans.

There has been little apparent demand for more sophisticated assemblers since the decline
of large-scale assembly language development.

Use of assembly language


Historical perspective

Historically, a large number of programs have been written entirely in assembly


language. Operating systems were almost exclusively written in assembly language until
the widespread acceptance of C in the 1970s and early 1980s. Many commercial
applications were written in assembly language as well, including a large amount of the
IBM mainframe software written by large corporations. COBOL and FORTRAN
eventually displaced much of this work, although a number of large organizations
retained assembly-language application infrastructures well into the 80s.

Most early microcomputers relied on hand-coded assembly language, including most


operating systems and large applications. This was because these systems had severe
resource constraints, imposed idiosyncratic memory and display architectures, and
provided limited, buggy system services. Perhaps more important was the lack of first-
class high-level language compilers suitable for microcomputer use. A psychological
factor may have also played a role: the first generation of microcomputer programmers
retained a hobbyist, "wires and pliers" attitude. Typical examples of large assembly
language programs from this time is the MS-DOS operating system, the early IBM PC
spreadsheet program Lotus 1-2-3, and almost all popular games for the Commodore 64.
Even into the 1990s, most console video games were written in assembly, including most
games for the Mega Drive/Genesis and the Super Nintendo Entertainment System[citation
needed]
. The popular arcade game NBA Jam (1993) is another example.
Current usage

There has always been debate over the usefulness and performance of assembly language
relative to high-level languages, though this gets less attention today. Assembly language
has specific niche uses where it is important; see below. But in general, modern
optimizing compilers are claimed to render high-level languages into code that runs at
least as fast as hand-written assembly, despite some counter-examples that can be created.
The complexity of modern processors makes effective hand-optimization increasingly
difficult. Moreover, and to the dismay of efficiency lovers, increasing processor
performance has meant that most CPUs sit idle most of the time, with delays caused by
predictable bottlenecks such as I/O operations and paging. This has made raw code
execution speed a non-issue for most programmers (hence the increasing use of
interpreted languages without apparent performance impact).

There are really only a handful of situations where today's expert practitioners would
choose assembly language:

• When a stand-alone binary executable is required, i.e. one that must execute
without recourse to the run-time components or libraries associated with a high-
level language; this is perhaps the most common situation. These are embedded
programs that store only a small amount of memory and the device is intended to
do single purpose tasks. Such examples consist of telephones, automobile fuel and
ignition systems, air-conditioning control systems, security systems, and sensors.
• When interacting directly with the hardware, e.g., in a device driver. Examples
consist of sound and video cards, hard drives, modems, and printers. For example
printer manufacturers create a different driver for each model that they sell that
correlates with a specific operating system.
• When using processor-specific instructions not exploited by or available to the
compiler. A common example is the bitwise rotation instruction at the core of
many encryption algorithms.
• When extreme optimization is required, e.g., in an inner loop in a processor-
intensive algorithm. Game programmers are experts at writing code that takes
advantage of the capabilities of hardware features in systems enabling the games
to run faster.
• When a system with severe resource constraints (e.g., an embedded system) must
be hand-coded to maximize the use of limited resources; but this is becoming less
common as processor price/performance improves
• When no high-level language exists, e.g., on a new or specialized processor
• Real-time programs that need precise timing and responses, such as simulations,
flight navigation systems, and hospital equipment to monitor patients. For
example, if a human heart behaves irregularly, the machine must be notified
quickly and the proper response needs to be initialized quickly so the patient does
not die. Assembly is still faster than code that must be translated to allow fast
response.
Few programmers today need to use assembly language on a daily basis. For most
applications, a higher-level language like C, C++ is generally chosen.

Nevertheless, assembly language is still taught in most Computer Science and Electronic
Engineering programs. Although few programmers today regularly work with assembly
language as a tool, the underlying concepts remain very important. Such fundamental
topics as binary arithmetic, memory allocation, stack processing, character set encoding,
interrupt processing, and compiler design would be hard to study in detail without a grasp
of how a computer operates at the hardware level. Since a computer's behavior is
fundamentally defined by its instruction set, the logical way to learn such concepts is to
study an assembly language. Most modern computers have similar instruction sets.
Therefore, studying a single assembly language is sufficient to learn: i) The basic
concepts; ii) To recognize situations where the use of assembly language might be
appropriate; and iii) To see how efficient executable code can be created from high-level
languages.[2]

Typical applications

Hand-coded assembly language is typically used in a system's BIOS. This low-level code
is used, among other things, to initialize and test the system hardware prior to booting the
OS, and is stored in ROM. Once a certain level of hardware initialization has taken place,
execution transfers to other code, typically written in higher level languages; but the code
running immediately after power is applied is usually written in assembly language. The
same is true of most boot loaders.

Many compilers render high-level languages into assembly first before fully compiling,
allowing the assembly code to be viewed for debugging and optimization purposes.
Relatively low-level languages, such as C, often provide special syntax to embed
assembly language directly in the source code. Programs using such facilities, such as the
Linux kernel, can then construct abstractions utilizing different assembly language on
each hardware platform. The system's portable code can then utilize these processor-
specific components through a uniform interface.

Assembly language is also valuable in reverse engineering, since many programs are
distributed only in machine code form, and machine code is usually easy to translate into
assembly language and carefully examine in this form, but very difficult to translate into
a higher-level language. Tools such as the Interactive Disassembler make extensive use of
disassembly for such a purpose.

Related terminology
• Assembly language or assembler language is commonly called assembly,
assembler, ASM, or symbolic machine code. A generation of IBM mainframe
programmers called it BAL for Basic Assembly Language.
Note: Calling the language assembler is of course potentially confusing and
ambiguous, since this is also the name of the utility program that translates
assembly language statements into machine code. Some may regard this as
imprecision or error. However, this usage has been common among professionals
and in the literature for decades.[3] Similarly, some early computers called their
assembler its assembly program.[4])

• The computational step where an assembler is run, including all macro


processing, is known as assembly time.

• The use of the word assembly dates from the early years of computers (cf. short
code, speed code/"speedcoding").

• A cross assembler (see cross compiler) produces code for one type of processor,
but runs on another. This technology is particularly important when developing
software for new processors.

Further details
For any given personal computer, mainframe, embedded system, and game console, both
past and present, at least one--possibly dozens--of assemblers have been written. For
some examples, see the list of assemblers.

On Unix systems, the assembler is traditionally called as, although it is not a single body
of code, being typically written anew for each port. A number of Unix variants use GAS.

Within processor groups, each assembler has its own dialect. Sometimes, some
assemblers can read another assembler's dialect, for example, TASM can read old MASM
code, but not the reverse. FASM and NASM have similar syntax, but each support
different macros that could make them difficult to translate to each other. The basics are
all the same, but the advanced features will differ.

Also, assembly can sometimes be portable across different operating systems on the same
type of CPU. Calling conventions between operating systems often differ slightly or not
at all, and with care it is possible to gain some portability in assembly language, usually
by linking with a C library that does not change between operating systems. However, it
is not possible to link portably with C libraries that require the caller to use preprocessor
macros that may change between operating systems.

For example, many things in libc depend on the preprocessor to do OS-specific, C-


specific things to the program before compiling. In fact, some functions and symbols are
not even guaranteed to exist outside of the preprocessor. Worse, the size and field order of
structs, as well as the size of certain typedefs such as off_t, are entirely unavailable in
assembly language, and differ even between versions of Linux, making it impossible to
portably call functions in libc other than ones that only take simple integers and pointers
as parameters.
Some higher level computer languages, such as C, support inline assembly where
relatively brief sections of assembly code can be embedded into the high level language
code. Borland Pascal also had an assembler compiler, which was initialized with a
keyword "asm". It was mainly used to create mouse and COM-port drivers. The Forth
programming language commonly contains an assembler used in CODE words.

Many people use an emulator to debug assembly-language programs.

Example listing of assembly language source code


Addr Label Instruction Object code[5]
.begin
.org 2048
a_start .equ 3000
2048 ld length,%
2064 be done 00000010 10000000 00000000 00000110
2068 addcc %r1,-4,%r1 10000010 10000000 01111111 11111100
2072 addcc %r1,%r2,%r4 10001000 10000000 01000000 00000010
2076 ld %r4,%r5 11001010 00000001 00000000 00000000
2080 ba loop 00010000 10111111 11111111 11111011
2084 addcc %r3,%r5,%r3 10000110 10000000 11000000 00000101
2088 done: jmpl %r15+4,%r0 10000001 11000011 11100000 00000100
2092 length: 20 00000000 00000000 00000000 00010100
2096 address: a_start 00000000 00000000 00001011 10111000
.org a_start
3000 a:

Example of a selection of instructions (for a virtual computer[6]) with the corresponding


address in memory where each instruction will be placed. These addresses are not static,
see memory management. Accompanying each instruction is the generated (by the
assembler) object code that coincides with the virtual computer's architecture (or ISA).

See also
• Little man computer - an educational computer model with a base-10 assembly
language
• x86 assembly language - the assembly language for common Intel 80x86
microprocessors
• Compiler
• Disassembler
• List of assemblers
• Instruction set

References
1. ^ David Salomon, Assemblers and Loaders. 1993 [1]
2. ^ Hyde, op. cit., Foreword ("Why would anyone learn this stuff?")
3. ^ Stroustrup, Bjarne, The C++ Programming Language, Addison-Wesley, 1986,
ISBN 0-201-12078-X: "C++ was primarily designed so that the author and his
friends would not have to program in assembler, C, or various modern high-level
languages. [use of the term assembler to mean assembly language]"
4. ^ Saxon, James, and Plette, William, Programming the IBM 1401, Prentice-Hall,
1962, LoC 62-20615. [use of the term assembly program]
5. ^ Murdocca, Miles J.; Vincent P. Heuring (2000). Principles of Computer
Architecture. Prentice-Hall. ISBN 0-201-43664-7.
6. ^ Principles of Computer Architecture (POCA) – ARCTools virtual computer
available for download to execute referenced code, accessed August 24, 2005

Books
• Programming from the Ground Up Online version of the introductory assembly
programming book.
• The Art of Assembly Language Programming, [2] by Randall Hyde
• Computer-Books.us, Online Assembly Language Books
• PC Assembly Language by Dr Paul Carter; *PC Assembly Tutorial using NASM
and GCC by Paul Carter
• Programming from the Ground Up by Jonathan Bartlett
• The x86 ASM Book by the ASM Community
• Dominic Sweetman: See MIPS Run. Morgan Kaufmann Publishers. ISBN 1-
55860-410-3
• Robert Britton: MIPS Assembly Language Programming. Prentice Hall. ISBN
0-13-142044-5
• John Waldron: Introduction to RISC Assembly Language Programming.
Addison Wesley. ISBN 0-201-39828-1
• Jeff Duntemann Assembly Language Step-by-Step

External links

Wikibooks has more about this subject:


Assembly Language

• WinAsm Studio, The Assembly IDE - Free Downloads, Source Code, a free
Assembly IDE, a lot of open source programs to download and a popular Board
• The ASM Community, a great ASM programming resource including a
Messageboard and an ASM Wiki Book
• Intel Assembly 80x86 CodeTable (a cheat sheet reference)
• MenuetOS - hobby Operating System for the PC written entirely in 64bit
assembly language
• List of resources; books, websites, newsgroups, and IRC channels
• Unix Assembly Language Programming
• PPR: Learning Assembly Language
• CodeTeacher
• Assembly Language Programming Examples
• Typed Assembly Language (TAL)
• Authoring Windows Applications In Assembly Language
• RosAsm assembler/ RosAsm assembly Forum
• RosAsm Programming Examples
• 80x86 emulator
• AVR Assembler
• The Program Transformation Wiki
• GoAsm - a component of the free "Go" tools: 32-bit and 64-bit Windows
programming for x86 and AMD64/EM64T
• GNU lightning is a library that generates assembly language code at run-time
which is useful for Just-In-Time compilers
• "information on assembly programming under different platforms: IA32 (x86),
IA64 (Itanium), x86-64, SPARC, Alpha, or whatever platform we find
contributors for."
• "Terse: Algebraic Assembly Language for x86"
• Iczelion's Win32 Assembly Tutorial
• SB-Assembler for most 8-bit processors/controllers
• Assembly Tutorials BeginnersCode.com
• IBM z/Architecture Principles of Operation IBM manuals on mainframe machine
language and internals.
• IBM High Level Assembler IBM manuals on mainframe assembler language.
• Tools and tutorials for x86 programmers
• Assembly Optimization Tips by Mark Larson

You might also like