Lecture 3
Lecture 3
Information
Information Storage
Rather than accessing individual bits in memory,
most computers use blocks of 8 bits, or bytes, as the smallest addressable
unit of memory.
A machine-level program views memory as a very large array of bytes,
referred to as virtual memory.
Every byte of memory is identified by a unique number, known as its
address,
and the set of all possible addresses is known as the virtual address space.
Information Storage
this virtual address space is just a conceptual image presented to the
machine-level program.
The actual implementation uses a combination of dynamic random access
memory (DRAM), flash memory, disk storage, special hardware,
and operating system software to provide the program with what appears
to be a monolithic byte array.
Hexadecimal Notation
A single byte consists of 8 bits.
In binary notation, its value ranges from (00000000)2 to (11111111)2.
When viewed as a decimal integer, its value ranges from (0)10 to (255)10.
Neither notation is very convenient for describing bit patterns.
Binary notation is too verbose,
while with decimal notation it is tedious to convert to and from bit
patterns.
Hexadecimal Notation
Instead, we write bit patterns as base-16, or hexadecimal numbers.
Hexadecimal (or simply “hex”) uses digits ‘0’ through ‘9’ along with
characters ‘A’ through ‘F’ to represent 16 possible values.
Figure 2.2 shows the decimal and binary values associated with the 16
hexadecimal digits.
Written in hexadecimal, the value of a single byte can range from (00)16 to
(FF)16.
Hexadecimal Notation
In C, numeric constants starting with 0x or 0X are interpreted as being in
hexadecimal.
The characters ‘A’ through ‘F’ may be written in either upper- or lowercase.
(e.g., 0xFa1D37b)
Hexadecimal Notation
A common task in working with machine-level programs
is to manually convert between decimal, binary, and hexadecimal
representations of bit patterns.
Converting between binary and hexadecimal is straightforward, since it can
be performed one hexadecimal digit at a time.
Conversely, given a binary number 1111001010110110110011,
you convert it to hexadecimal by first splitting it into groups of 4 bits each.
Hexadecimal Notation
Practice Problem 2.1 (solution page 179)
Practice Problem 2.2 (solution page 179)
Practice Problem 2.3 (solution page 180)
Practice Problem 2.4 (solution page 180)
Data Sizes
Every computer has a word size, indicating the nominal size of
pointer data.
Since a virtual address is encoded by such a word,
the most important system parameter determined by the word size
is the maximum size of the virtual address space.
Data Sizes
That is, for a machine with a w-bit word size,
The virtual addresses can range from 0 to 2w − 1, giving the program
access to at most 2w bytes.
A 32-bit word size limits the virtual address space to 4 gigabytes (4GB).
64-bit word size leads to a virtual address space of 16 exabytes.
Data Sizes
Most 64-bit machines can also run programs compiled for use on 32-bit
machines, a form of backward compatibility.
So, for example, when a program prog.c
is compiled with the directive
linux> gcc -m32 prog.c
then this program will run correctly on either a 32-bit or a 64-bit machine.
Data Sizes
On the other hand, a program compiled with the directive
linux> gcc -m64 prog.c
will only run on a 64-bit machine.
We will therefore refer to programs as being either “32-bit programs” or
“64-bit programs,”
since the distinction lies in how a program is compiled, rather than the type
of machine on which it runs.
Data Sizes
Computers and compilers support multiple data formats
using different ways to encode data, such as integers and floating point,
as well as different lengths.
For example, many machines have instructions for manipulating single
bytes,
as well as integers represented as 2-, 4-, and 8-byte quantities.
They also support floating-point numbers represented as 4- and 8-byte
quantities.
Addressing and Byte Ordering
For program objects that span multiple bytes,
we must establish two conventions:
what the address of the object will be, and how we will order the bytes
in memory.
In virtually all machines, a multi-byte object is stored as a contiguous
sequence of bytes,
with the address of the object given by the smallest address of the
bytes used.
Addressing and Byte Ordering
For example, suppose a variable x of type int has address 0x100;
that is, the value of the address expression &x is 0x100.
Then (assuming data type int has a 32-bit representation)
the 4 bytes of x would be stored in memory locations 0x100, 0x101,
0x102, and 0x103.
Some machines choose to store the object in memory ordered from
least significant byte to most,
while other machines store them from most to least.
Addressing and Byte Ordering
The former convention,
where the least significant byte comes first is referred to as little endian.
The latter convention where the most significant byte comes first is
referred to as big endian.
Most Intel-compatible machines operate exclusively in little-endian
mode.
Addressing and Byte Ordering
On the other hand, most machines from IBM and Oracle operate in
big-endian mode.
Many recent microprocessor chips are bi-endian,
meaning that they can be configured to operate as either little- or big-
endian machines.
Addressing and Byte Ordering
Suppose the variable x of type int and at address 0x100 has a hexadecimal
value of 0x01234567.
The ordering of the bytes within the address range 0x100 through 0x103
depends on the type of machine:
Addressing and Byte Ordering
In practice, however, byte ordering becomes fixed once a particular
operating system is chosen.
For example, ARM microprocessors, used in many cell phones, have
hardware that can operate in either little- or big-endian mode,
but the two most common operating systems for these chips
Android (from Google) and IOS (from Apple) operate only in little-endian
mode.
Integer Representation
Integers are whole numbers or fixed-point numbers with the radix point
fixed after the least-significant bit.
They are contrast to real numbers or floating-point numbers, where the
position of the radix point varies.
integers and floating-point numbers are treated differently in computers.
They have different representation and are processed differently.
e.g., floating-point numbers are processed in a so-called floating-point
processor.
Integer Representation
Computers use a fixed number of bits to represent an integer.
The commonly-used bit-lengths for integers are 8-bit, 16-bit, 32-bit or 64-
bit.
Besides bit-lengths, there are two representation schemes for integers:
1. Unsigned Integers: can represent zero and positive integers.
2. Signed Integers: can represent zero, positive and negative integers.
Integer Representation
Three representation schemes had been proposed for signed integers:
a) Sign-Magnitude representation
b) 1's Complement representation
c) 2's Complement representation
Integer Representation
You, as the programmer, need to decide on the bit-length and
representation scheme for your integers,
depending on your application's requirements.
Suppose that you need a counter for counting a small quantity from 0 up
to 200,
you might choose the 8-bit unsigned integer scheme as there is no
negative numbers involved.
n-bit Unsigned Integers
Unsigned integers can represent zero and positive integers, but not
negative integers.
The value of an unsigned integer is interpreted as "the magnitude of its
underlying binary pattern".
An n-bit pattern can represent 2n distinct integers.