Interrupts and Exceptions: COMS W6998 Spring 2010
Interrupts and Exceptions: COMS W6998 Spring 2010
Exceptions
COMS W6998
Spring 2010
Overview
The Hardware Part
Interrupts and Exceptions
Exception Types and Handling
Interrupt Request Lines (IRQs)
Programmable Interrupt Controllers (PIC)
Interrupt Descriptor Table (IDT)
Hardware Dispatching of Interrupts
The Software Part
Nested Execution
Kernel Stacks
SoftIRQs, Tasklets
Work Queues
Threaded Interrupts
Simplified Architecture Diagram
Central
Main
Processing
Memory
Unit
system bus
no
Interrupts
Forcibly change normal flow of control
Similar to context switch (but lighter weight)
Hardware saves some context on stack; Includes
interrupted instruction if restart needed
Enters kernel at a specific point; kernel then
figures out which interrupt handler should run
Execution resumes with special “iret” instruction
Many different types of interrupts
Types of Interrupts
Asynchronous
From external source, such as I/O device
Not related to instruction being executed
Synchronous (also called exceptions)
Processor-detected exceptions:
Faults — correctable; offending instruction is retried
Programmed exceptions:
Requests for kernel intervention (software intr/syscalls)
Faults
Instruction would be illegal to execute
Examples:
Writing to a memory segment marked ‘read-only’
Reading from an unavailable memory segment (on disk)
Executing a ‘privileged’ instruction
Detected before incrementing the IP
The causes of ‘faults’ can often be ‘fixed’
If a ‘problem’ can be remedied, then the CPU can
just resume its execution-cycle
Traps
A CPU might have been programmed to
automatically switch control to a ‘debugger’
program after it has executed an instruction
That type of situation is known as a ‘trap’
It is activated after incrementing the IP
Error Exceptions
Most error exceptions — divide by zero, invalid
operation, illegal memory reference, etc. — translate
directly into signals
This isn’t a coincidence. . .
The kernel’s job is fairly simple: send the
appropriate signal to the current process
force_sig(sig_number, current);
That will probably kill the process, but that’s not the
concern of the exception handler
One important exception: page fault
An exception can (infrequently) happen in the kernel
die(); // kernel oops
Intel-Reserved ID-Numbers
Of the 256 possible interrupt ID numbers, Intel reserves the first 32
for ‘exceptions’
OS’s such as Linux are free to use the remaining 224 available
interrupt ID numbers for their own purposes (e.g., for service-
requests from external devices, or for other purposes such as
system-calls)
Examples:
0: divide-overflow fault
6: Undefined Opcode
7: Coprocessor Not Available
11: Segment-Not-Present fault
12: Stack fault
13: General Protection Exception
14: Page-Fault Exception
Interrupt Hardware
Legacy PC Design
(for single-proc IRQs
systems)
Ethernet Slave Master
PIC PIC
x86
SCSI Disk (8259) (8259)
INTR CPU
Real-Time Clock
INT:
INTA:
CPU CPU
0 1 I/O
APIC
LOCAL LOCAL
APIC APIC
(The legacy PICs are masked when the APICs are enabled)
APIC, IO-APIC, LAPIC
Advanced PIC (APIC) for SMP systems
Used in all modern systems
Interrupts “routed” to CPU over system bus
IPI: inter-processor interrupt
Local APIC (LAPIC) versus “frontend” IO-APIC
Devices connect to front-end IO-APIC
IO-APIC communicates (over bus) with Local APIC
Interrupt routing
Allows broadcast or selective routing of interrupts
Ability to distribute interrupt handling load
Routes to lowest priority process
Special register: Task Priority Register (TPR)
Arbitrates (round-robin) if equal priority
Hardware to Software
Memory Bus
IRQs 0
idtr
INTR IDT
0
PIC CPU
vector
N handler
Mask points
255
Assigning IRQs to Devices
IRQ assignment is hardware-dependent
Sometimes it’s hardwired, sometimes it’s set physically,
sometimes it’s programmable
PCI bus usually assigns IRQs at boot
Some IRQs are fixed by the architecture
IRQ0: Interval timer
IRQ2: Cascade pin for 8259A
Linux device drivers request IRQs when the device is opened
Note: especially useful for dynamically-loaded drivers, such as
for USB or PCMCIA devices
Two devices that aren’t used at the same time can share an IRQ,
even if the hardware doesn’t support simultaneous sharing
Assigning Vectors to IRQs
Vector: index (0-255) into interrupt descriptor table
Vectors usually IRQ# + 32
Below 32 reserved for non-maskable intr & exceptions
Maskable interrupts can be assigned as needed
Vector 128 used for syscall
Vectors 251-255 used for IPI
Interrupt Descriptor Table
The ‘entry-point’ to the interrupt-handler is located
via the Interrupt Descriptor Table (IDT)
IDT: “gate descriptors”
Segment selector + offset for handler
Descriptor Privilege Level (DPL)
Gates (slightly different ways of entering kernel)
Task gate: includes TSS to transfer to (not used by
Linux)
Interrupt gate: disables further interrupts
Trap gate: further interrupts still allowed
Interrupt Masking
Two different types: global and per-IRQ
Global — delays all interrupts
Selective — individual IRQs can be masked
selectively
Selective masking is usually what’s needed
— interference most common from two
interrupts of the same type
Putting It All Together
Memory Bus
IRQs 0
idtr
INTR IDT
0
PIC CPU
vector
N handler
Mask points
255
Dispatching Interrupts
Each interrupt has to be handled by a special
device- or trap-specific routine
Interrupt Descriptor Table (IDT) has gate descriptors
for each interrupt vector
Hardware locates the proper gate descriptor for this
interrupt vector, and locates the new context
A new stack pointer, program counter, CPU and
memory state, etc., are loaded
Global interrupt mask set
The old program counter, stack pointer, CPU and
memory state, etc., are saved on the new stack
The specific handler is invoked
Overview
The Hardware Part
Interrupts and Exceptions
Exception Types and Handling
Interrupt Request Lines (IRQs)
Programmable Interrupt Controllers (PIC)
Interrupt Descriptor Table (IDT)
Hardware Dispatching of Interrupts
The Software Part
Nested Execution
Kernel Stacks
SoftIRQs, Tasklets
Work Queues
Threaded Interrupts
Nested Interrupts
What if a second interrupt occurs while an
interrupt routine is excuting?
Generally a good thing to permit that — is it
possible?
And why is it a good thing?
Maximizing Parallelism
You want to keep all I/O devices as busy as
possible
In general, an I/O interrupt represents the
end of an operation; another request should
be issued as soon as possible
Most devices don’t interfere with each others’
data structures; there’s no reason to block
out other devices
Handling Nested Interrupts
As soon as possible, unmask the global
interrupt
As soon as reasonable, re-enable interrupts
from that IRQ
But that isn’t always a great idea, since it
could cause re-entry to the same handler
IRQ-specific mask is not enabled during
interrupt-handling
Nested Execution
Interrupts can be interrupted
By different interrupts; handlers need not be reentrant
No notion of priority in Linux
Small portions execute with interrupts disabled
Interrupts remain pending until acked by CPU
Exceptions can be interrupted
By interrupts (devices needing service)
Exceptions can nest two levels deep
Exceptions indicate coding error
Exception code (kernel code) shouldn’t have bugs
Page fault is possible (trying to touch user data)
Interrupt Handling Philosophy
Do as little as possible in the interrupt handler
Defer non-critical actions till later
Structure: top and bottom halves
Top-half: do minimum work and return (ISR)
Bottom-half: deferred processing (softirqs,
tasklets, workqueues, kernel threads)
Top half
Bottom
tasklet softirq workqueue kernel thread half
Top Half: Do it Now!
Technically is the interrupt handler
Perform minimal, common functions: save registers, unmask other interrupts. Eventually,
undoes that: restores registers, returns to previous context.
Often written in assembler
IRQ is typically masked for duration of top half
Most important: call proper interrupt handler provided in device drivers (C program)
Don’t want to do too much here
IRQs are masked for part of the time
Don’t want stack to get too big
Typically queue the request and set a flag for deferred processing in a bottom half
Top Half: Find the Handler
On modern hardware, multiple I/O devices
can share a single IRQ and hence interrupt
vector
First differentiator is the interrupt vector
Multiple interrupt service routines (ISR) can
be associated with a vector
Each device’s ISR for that IRQ is called
Device determines whether IRQ is for it
Bottom Half: Do it Later!
Mechanisms to defer work to later:
softirqs
tasklets (built on top of softirqs)
work queues
kernel threads
All can be interrupted
Top half
Bottom
tasklet softirq workqueue kernel thread half
Warning: No Process Context
Interrupts (as opposed to exceptions) are not
associated with particular instructions
They’re also not associated with a given
process (user program)
The currently-running process, at the time of
the interrupt, as no relationship whatsoever to
that interrupt
Interrupt handlers cannot sleep!
What Can’t You Do?
You cannot sleep
or call something that might sleep
You cannot refer to current
You cannot allocate memory with GPF_KERNEL
(which can sleep), you must use GPF_ATOMIC
(which can fail)
You cannot call schedule()
You cannot do a down() semaphore call
However, you can do an up()
You cannot transfer data to/from user space
E.g., copy_to_user(), copy_from_user()
Interrupt Stack
When an interrupt occurs, what stack is
used?
Exceptions: The kernel stack of the current
process, whatever it is, is used (There’s always
some process running — the “idle” process, if
nothing else)
Interrupts: hard IRQ stack (1 per processor)
SoftIRQs: soft IRQ stack (1 per processor)
These stacks are configured in the IDT and
TSS at boot time by the kernel
Softirqs
Statically allocated: specified at kernel compile time
Limited number:
Priority Type
0 High-priority tasklets
1 Timer interrupts
2 Network transmission
3 Network reception
4 Block devices
5 Regular tasklets
When Do Softirqs Run?
Run at various points by the kernel:
After system calls
After exceptions
After interrupts (top halves/IRQs, including the timer intr)
When the scheduler runs ksoftirqd
Softirq routines can be executed simultaneously on
multiple CPUs:
Code must be re-entrant
Code must do its own locking as needed
Hardware interrupts always enabled when softirqs
are running.
Rescheduling Softirqs
A softirq routine can reschedule itself
This could starve user-level processes
Softirq scheduler only runs a limited number
of requests at a time
The rest are executed by a kernel thread,
ksoftirqd, which competes with user
processes for CPU time
Tasklets
Built on top of softirqs
Can be created and destroyed dynamically
Run on the CPU that scheduled it (cache affinity)
Individual tasklets are locked during execution; no
problem about re-entrancy, and no need for locking
by the code
Tasklets can run in parallel on multiple CPUs
Same tasklet can only run on one CPU
Were once the preferred mechanism for most
deferred activity, now changing
The Trouble with Tasklets
Hard to get right
One has to be careful about sleeping
They run at higher priority than other tasks in
the systems
Can produce uncontrolled latency if coded
badly
Ongoing discussion about eliminating tasklets
Will likely slowly fade over time
Work Queues
Always run by kernel threads
Are scheduled by the scheduler
Softirqs and tasklets run in an interrupt context; work
queues have a pseudo-process context
i.e., have a kernel context but no user context
Because they have a pseudo-process context, they
can sleep
Work queues are shared by multiple devices
Thus, sleeping will delay other work on the queue
However, they’re kernel-only; there is no user mode
associated with it
Don’t try copying data into/out of user space
Kernel Threads
Always operate in kernel mode
Again, no user context
2.6.30 introduced the notion of threaded
interrupt handlers
Imported from the realtime tree
request_threaded_irq()
Now each bottom half has its own context, unlike
work queues
Idea is to eventually replace tasklets and work
queues
Comparing Approaches
ISR SoftIRQ Tasklet WorkQueue KThread
Will be run on same processor as ISR? N/A Yes Yes Yes Maybe
Same one can run on multiple CPUs? Yes Yes No Yes Yes
47 16 15 0
TR
The CPU knows the layout
of fields in the Task-State
Segment
GDTR
IDT Initialization
Initialized once by BIOS in real mode
Linux re-initializes during kernel init
Must not expose kernel to user mode access
start by zeroing all descriptors
Linux lingo:
Interrupt gate (same as Intel; no user access)
Not accessible from user mode
jmp common_interrupt
Common code:
common_interrupt:
SAVE_ALL // save a few more registers than hardware
call do_IRQ
jmp $ret_from_intr