2-Architecture
2-Architecture
Eric Lo
http://faculty.ycp.edu/~dhovemey/spring2006/cs101/lecture/lecture1.html
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/1_Introduction.html 2
Inside a CPU -- Registers
3
Inside a CPU
• Program Counter:
• A register that contains the memory address of the next instruction
• Instruction Cycle (Fetch-Decode-Execute-MemoryAccess-WriteBack)
– “Fetch”: fetch the next instruction from the memory
– “Decode”: prepares various registers in readiness of the next step.
– E.g., instruction: ADD a, b will
» Load content of a to register 1
» Load content of b to register 2
– “Execute”: carry out the computation / compute the memory address
– “MemoryAccess”: read/write from/to memory
• For LOAD/STORE instruction
– “WriteBack”: write back results to the
register
4
Sequential vs Scalar (Pipeline)
Ref.lighterra.com; the Memory Access Phase is missing in the figure because sequential processors are too old, no MA
phase by that time
5
Scalar vs Superscalar
• Scalar = Pipelining
• Super-scalar =
instruction-level
parallelism
= multiple cycles in
parallel
6
Processor Design
http://www.markedbyteachers.com
7
Intel
RF Wireless World
8
Intel’s Dilemma
• Now almost all compliers generate CISC ISA code
machine
code Decode: CSIC -> RISC
9
More inside a modern CPU
edux.pjwstk.edu.pl
10
More inside a modern CPU
• With speculative execution
– Instructions executed != Instructions retired
• OoO execution (More instruction-level parallelism)
– Examines a sliding window of consecutive instructions
• The “instruction window”
– ”Ready” instructions that don’t (or no longer) depend on
any former instructions could be executed first
– https://en.wikipedia.org/wiki/Out-of-order_execution
• Can the compiler do the job?
– Complier doesn’t have the runtime information
• E.g., it can’t pick ahead if the next instruction is a branch
11
Multi-core + NUMA
• Even more cycles in parallel
12
Memory Hierarchy
Ref. DZone 13
Why we have to know all these?
• Random access vs. sequential access time on HDD
• OS and all other systems had been optimizing to
reduce random access on HDD
• “Tape is Dead, Disk is Tape, Flash is Disk, RAM
Locality is King.”
• What about
– Random access vs. sequential access time on RAM?
– Random access hurts CPU’s speculative execution
• => branch mis-prediction => expensive!
• How to achieve “if” without using “if”?
14
Wait… how are these hardware things related to OS?
• Kernel is the piece of of software that directly deals with the
hardware
• OS/Kernel programmers should arguably be writing the most
efficient programs on earth
– Efficient on memory usage
• Remember PDP-7 only had 9kB RAM
– Efficient on speed
• If kernel programs don’t care about speed, who cares more?
– Different implementations for different architectures!
• E.g., Intel Xeon’s cache is inclusive whereas AMD Opteron’s cache is non-
inclusive
– When reading X from memory,
» Inclusive: X will have a copy from L1 to L3
» Non-inclusive: X will read X into L1 directly (L2 and L3 have no copies)
• Kernel programmers need to know this when designing cache replacement
policies
• That also explains why…
15
Different implementations are needed for different
architectures
• x
16
CPU’s Privileged Instructions
• Some instructions just add two values
• Some instructions are privileged
– E.g., set the segmentation boundary of the memory
• CPU has a 1-bit register to check if currently in user-
mode or kernel-mode
– If (user-mode & this-instruction-is-privileged) then
• generate an “insufficient privilege access” exception
• On an exception / a hardware interrupt
– CPU will go to a hardcoded memory address to lookup
the corresponding handler
17
T.Anderson book
Physical Memory Address
from 000000H to 0003FFH (1024 bytes)
Kernel space
18
Hacking?
• So, a malicious user writes code to access other
memory region through using some privileged
instruction directly?
– Any normal program must run on top of your OS and
your OS won’t permit your program to set that bit
– So the most powerful attack is that you gain physical
access to a machine and insert a boot device and reboot
to your own OS…
• But this is not the kind of security we concern
19
System Calls
• A benign program that wants to do something low-
level?
– Through system calls
• written by OS developers
• exposed for anyone to write program on
int a_sys_call() {
1-bit register = kernel-mode;
… access the kernel memory
1-bit register = user-mode;
}
20