Lecture 1
Lecture 1
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 1
Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 2
Carnegie Mellon
1.1V
0.9V
0.2V
0.0V
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 3
Carnegie Mellon
https://en.wikipedia.org/wiki/Character_encoding https://en.wikipedia.org/wiki/Punched_tape
REPRESENTING CHARACTERS
Characters encoding
❑ ASCII (American Standard Code for Information Interchange)
❑ 127 char. including 95 printable char. – stored using 1 byte (= 8bits) per
character – ok for English but not for most languages: French (’ç’), German,
Greek, Chinese, …
❑ Unicode (industry standard developed by the Unicode consortium since 1990)
❑ Latest: over 143,00 characters covering 154 modern and historic scripts
❑ Standard defines UTF-8, UTF-16, and UTF-32 (each can represent anything
the others can represent but their size is ≠, UTF-32 char. always 4 bytes long)
❑ ex: UTF-8, dominantly used by websites (over 90%), uses one byte for the first
128 “code points” (index of the character in the table), and up to 4 bytes for
other characters. The first 128 Unicode code points are the ASCII characters,
which means that any ASCII text is also a UTF-8 text”
❑ EBCDIC (Extended Binary Coded Decimal Interchange Code) → from IBM,
disappearing How many languages in the world?
Around 6900, see
https://www.linguisticsociety.org/content/how-many-languages-are-there-world
nb: not all are written, and many rely on the same characters set
33 0x104
In other programming languages 00 0x105
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 6
Carnegie Mellon
https://www.mathsisfun.com/numbers/numbers-numerals-digits.html
VS Positional Systems
E.g. in the decimal system (base 10), the numeral
4327 means (4×103) + (3×102) + (2×101) + (7×100)
noting that 100 = 1.
Byte = 8 bits
▪ Binary 000000002 to 111111112 0 0 0000
▪ Decimal: 010 to 25510 1 1 0001
2 2 0010
▪ Hexadecimal 0016 to FF16 3 3 0011
▪ Base 16 number representation 4 4 0100
5 5 0101
▪ Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ 6 6 0110
7 7 0111
▪ FA1D37B16 in many programming languages 8 8 1000
is written as: 9 9 1001
A 10 1010
– 0xFA1D37B or B 11 1011
– 0xfa1d37b (case insensitive) C 12 1100
D 13 1101
E 14 1110
F 15 1111
Example: 13 = 2^3 + 2^2 + 2^0
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 10
Carnegie Mellon
[Wikipedia Radix]
[Sanjay Kulkarni]
✓ The alien cannot understand 4 as digit 4 does not exist in its base
(4 in based 10 is expressed as ’10’ in base 4)
✓ the number of unique symbols in any base is ’10’ in that base
✓ How should the astronaut explain we are using base 10?
REPRESENTATION IN MEMORY
Machine Words
Any given computer has a “Word Size”
▪ Nominal size of integers, memory addresses and operands of most
instructions manipulating integers
Byte Ordering
So, how are the bytes within a multi-byte word ordered in
memory?
Conventions
▪ Big Endian: Sun (obsolete), PPC Mac (obsolete), Internet protocols
▪ Least significant byte has highest address
▪ Little Endian: x86, ARM processors running Android, iOS, and Windows
▪ Least significant byte has lowest address
Bi-endian: “Some architectures (e.g., Intel Itanium - IA-64)
feature a setting which allows for switchable endianness in data
fetches and stores, instruction fetches, or both.”
Big-endian is the most common format in data networking - fields in the
protocols of the Internet protocol suite, such as IPv4, IPv6, TCP, and UDP,
are transmitted in big-endian order.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 16
Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 17
Carnegie Mellon
ENCODING INTEGERS
A) UNSIGNED ENCODING
B) TWO’S COMPLEMENT FOR
SIGNED INTEGERS
#bits needed to
store an
unsigned int ?
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 20
Carnegie Mellon
Encoding Integers
Commonly-used lengths for integers are 8,16,32,64 bits.
There are 2 types of integers:
Unsigned Integers: can represent zero and positive integers.
Signed Integers: can represent zero, positive and negative
integers. Several distinct representation schemes have been
proposed for signed integers, e.g:
▪ Sign-Magnitude representation
▪ 1's Complement representation
▪ 2's Complement representation
Notations
We denote the vector made up of [xw−1, xw−2, . . . , x0] the
individual bits of an integer data type of w bits written in
binary notation
Function B2U() (binary-to-unsigned) takes as input this vector
and returns its unsigned interpretation
Similarly function B2T() (binary-to-two’s complement) returns
its signed interpretation
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 22
Carnegie Mellon
Sign Bit
▪ For 2’s complement, most significant bit indicates sign
▪ 0 for nonnegative Let’s consider integers of 4 bits, what
▪ 1 for negative is the value of 1111 in unsigned and
two’s complement encodings ?
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 23
Carnegie Mellon
Encoding Integers
Contribution of
each power of two
Example binary
strings and their value
Observations
▪ |TMin | = TMax + 1
▪ Asymmetric range: there
is one more negative
value than positive value
▪ UMax = (2 * Tmax) + 1
https://hpc-docs.uni.lu/getting-started/
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 27
Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 28
Carnegie Mellon
type: the bit field stay identical but how these bits
are interpreted change – also called casting
Let’s consider a variable of type unsigned int
(16 bits) assigned to FFFF. What is the result of
“casting” it into a 16-bit signed int ?
Proper conversion: the bit field may not remain
identical after conversion (e.g., conversion from a
float to an integer)
Let u and v, two 16-bit unsigned integer variables both equal to 65535
What happens when the processor executes u := u + v ?
NUMERICAL LIMITATIONS OF
COMPUTERS – MODULO
ARITHMETIC FOR INTEGERS
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 30
Carnegie Mellon
[http://www.cs.uwm.edu/classes/cs315/Bacon/Lecture/HTML/ch04.html]
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 31
Carnegie Mellon
Overflow in action
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 32
Carnegie Mellon
[Wikipedia - https://en.wikipedia.org/wiki/Integer_overflow]
Unsigned Addition
Operands: w-bit quantities u •••
thus between 0 and 2^w - 1
+v •••
“True” sum may require w+1 bits u+v •••
What is done to not exceed UAddw(u , v) •••
w bits? The highest-order bit of
the (w+1)-bit quantity is discarded
→ This results in u + v mod 2w
Unsigned addition implements Modular Arithmetic
s = UAddw(u , v) = u + v mod 2w
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 34
Carnegie Mellon
max value is 15
▪ Compute true sum
Add4(u , v)
32
▪ Values increase linearly 28
with u and v 24
20
16
14
12 12
8 10
8
4
0 4
6
v
0
2 2
4
6
u 8
10
12
14
0
True Sum 10
2w+1 Overflow 8
6 12
14
4 10
8
2w
2
0
6 v
4
0
2 2
4
6
0 u 8
10
12
14
0
Modular Sum
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 36
Carnegie Mellon
[00010101] << 3 ?
[01100011] << 3 ?
Nb: w=8
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 38
Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 39
Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 40
Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 41
Carnegie Mellon
Nb: p/q is a fraction if both p and q are positive (with q≠0), otherwise it is a rational
number. All fractions are rational numbers.
0.25 is a fraction, while -0.25 is a rational number.
Decimal numbers
A decimal number (or decimal fraction) is a number in base 10 with
a decimal point, like 0.6, 12.34, noting that 12.34 is a shorthand
way of writing 12 + 3/10 + 4/100.
Representation 2-j
▪ Bits to right of “binary point” are negative powers of 2
▪ value:
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 44
Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 45
Carnegie Mellon
Observations
1. Numbers of form 0.111111…2 are just below 1.0
1/2 + 1/4 + 1/8 + … + 1/2i + … ➙ 1.0
•
2. Divide by 2 by shifting the binary point to the left by one position
3. Multiply by 2 by shifting the binary point to the right by one position
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 46
Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition [edited NN] 47