02 classicalCC
02 classicalCC
Then, how semaphores are implemented. This may be within the OS, or at application level,
in the runtime system of a concurrent programming language
• Concurrent composite operations in main memory, introducing the notion that a single,
meaningful, high-level, operation may involve several separate low-level operations.
CISC machines had many read-memory, test result, store-to-memory types of instruction
RISC (load/store) architectures may only use a single memory access per instruction
read-and-clear will work:
flag=0 //shared data is busy
flag=1 //shared data is free (initial value)
entry protocol:
read-and-clear, register flag
// if value in register is 0, shared data was busy so retry
// if value in register is 1, shared data was free and you claimed it
// can also be used for condition synchronisation – see later
Multicore machines have atomic instructions e.g. x86 LOCK instruction prefix
Classical shared memory concurrency control 5
Mutual exclusion without hardware support
This was a hot topic in the 1970s and 80s.
Examples for N-process mutual exclusion are:
Eisenberg M. A. and McGuire M. R.,
Further comments on Dijkstra’s concurrent programming control problem
CACM 15(11), 1972
Lamport L
A new solution to Dijkstra’s concurrent programming problem
CACM, 17(8), 1974
(his N-process bakery algorithm)
With multi-core instruction reordering it is not proven that such programs are correct.
wait (aSem)
if aSem > 0 then aSem = aSem – 1
else suspend the executing process waiting on aSem
signal (aSem)
if there are no processes waiting on aSem
then aSem = aSem + 1
else free one waiting process – continues after its wait instruction
0 wait (aSem)
0 B wait (aSem)
CR
0 B, C B blocked wait (aSem)
signal (aSem)
0 C blocked
C CR
0 signal (aSem)
CR
1 signal (aSem)
aSem
A B A B
0 0
0 wait (aSem)
A signal (aSem)
1
A blocked “wake-up waiting”
A B shared
data
N
protected by
signal (aSem) wait (aSem) aSem wait (aSem)
user threads
runtime system – user thread implementation
implementation of wait and signal on semaphores
When user threads are mapped to kernel threads, wait and signal must themselves be
atomic operations. This is clearly the case for a multiprocessor, and also for
a uniprocessor with preemptive scheduling.
Associate a flag with each semaphore, and use an atomic instruction such as read-and-clear.
This also applies to kernel threads executing the OS and using OS- managed semaphores
for mutual exclusion and condition synchronisation.
The need for concurrency control first came from OS design. We now have concurrent
programming languages and OSs support multi-threaded processes.
3. We note that processes that only read shared data can read simultaneously,
whereas a process that writes must have exclusive access to the data.
We develop a solution that gives priority to writers over readers,
on the assumption that writers are keeping the data up-to-date.
producer consumer
producer consumer
variation: – allow one producer and one consumer to access the buffer in parallel – left as an exercise
counts:
ar = active readers
rr = reading readers (active readers who have proceeded to read)
aw = active writers
ww = writing writers (active writers who have proceeded to write)
but they must wait to write one-at-a-time
ar = ar-1 aw = aw-1
rr = rr-1 ww = ww-1
if rr = 0 if aw = 0
then signal waiting writers then signal waiting readers
exit exit
Note that a signal unblocks only one blocked process. The values of the counts indicate
how many signals to send. The last writing writer must unblock all blocked readers.
The last reading reader must unblock all waiting writers.
Take care not to wait while holding the semaphore that protects the shared counts.
That would cause deadlock.
But this is only mutual exclusion. Condition synchronisation was added to CCRs
by including
await < some condition on shared data >
The implementation allows the region to be temporarily left/freed if the condition is false
and the process executing await blocked until the condition becomes true (and its turn comes).
Note that the programmer must leave the data structure in a consistent state
before executing await, as well as before exiting the region.
CCRs are difficult to implement. Programmers may invent any condition on the shared data.
All conditions have to be tested when any process leaves the region.
We now introduce an illustration of CCRs and the subsequent evolution of concurrency control
• = potential delay
- operations that read and/or write execute under mutual exclusion (semaphore implementation)
- in monitors, condition synchronisation is provided, by wait and signal operations on
condition variables, named by programmers e.g. not-full, free-to-read
- processes must test the data and decide whether they need to block until a condition becomes true
- a process that waits on a condition variable always blocks, first releasing the monitor lock
(the implementation manages this)
- signal has no effect if there are no processes blocked on the condition variable being signalled
- after signal the monitor lock must be re-acquired for an unblocked process after the signalling
process has left the region (the implementation manages this)
Classical shared memory concurrency control 24
Passive object example: monitors and condition variables
producer process operation: insert (item)
produce item if buffer is full then data: cyclic, N-slot buffer
call insert (item) wait (notfull)
•
insert item outptr
signal (notempty)
• = potential delay
- condition synchronisation is similar to the pthreads package
- wait blocks the process/thread and releases the exclusion on the object
- notify: the implementation frees an arbitrary process – take care!
- notifyAll: the implementation frees all blocked processes. The first to be scheduled
may resume its execution (under exclusion) but must retest the wait condition.
The implementation must manage reclaiming the exclusion, i.e. the PC of the resuming
processes to achieve retest. Note that processes could resume and block repeatedly,
e.g. on a multiprocessor.
Classical shared memory concurrency control 26
Java example, buffer for a single integer, Bacon and Harris section 12.2.4, p369
public class Buffer {
private int value = 0;
private boolean full = false;
- shared data is encapsulated with operations in an active object, called by concurrent processes
- the managing process performs condition testing, and ..
- .. only accepts calls to operations with guards that evaluate to true
- mutual exclusion and condition synchronisation are ensured by the managing process
- note that synchronisation is at the granularity of whole operations (note that path expressions
also have this feature)
- which process (caller or manager)? executes the accepted operation is implementation-dependent
Convoy effect - a long lock-hold can hold up a lot of potentially short ones.
- try to program with fine-grained locking (components rather than whole structures)
Library calls - a universal problem! Static analysis of code executed under mutual exclusion
becomes impossible when these operations make extensive use of library calls.
( motivation for Java+Kilim – see later )
Classical shared memory concurrency control 30
Composite operations in main memory - 1
We have studied how to make one operation on shared data atomic in the presence of
concurrency and crashes.
Now suppose a meaningful operation comprises several such operations:
e.g. transfer: subtract a value from one data item and add the same value to another.
e.g. test some integer counts to decide whether you can write some shared data;
proceed to write if there are no existing readers or writers
invoke_operation ( args )
process P process Q
semA
semB
wait ( semB ) wait ( semA )
B
At this point we have deadlock. Process P holds semA and is blocked, queued on semB
Process Q holds semB and is blocked, queued on semA
Neither process can proceed to use the resources and signal the respective semaphores.
A cycle of processes exists, where each holds one resource and is blocked waiting for
another, held by another process in the cycle.
Deadlock: systems that allocate resources dynamically are subject to deadlock.
We later study the policies and conditions necessary and sufficient for deadlock to exist.
invoke_operation ( args )
Concurrency control: why not lock all data – do all operations – unlock?
But contention may be rare, and “locking all” may impose overhead and slow response (e.g. Python)
Crashes?: in main memory everything is lost on a crash – no problem! unless any externally
visible effects have occurred. These could be output, or changes to persistent state.
We’ll consider persistent store later. Assume that output generated by concurrent
composite operations should be deferred until the operation completes successfully.
H 10 30 T
head tail
key
*next
address = node.key
address = node.next
Insertion is straightforward. First, the list is traversed until the correct position is found.
Then a new cell is created and inserted using CAS (compare and swap)
H 10 30 T
20
H 10 30 T
20
CAS (address, old, new) atomically compares the contents of address with the old value
and, if they match, writes the new value to that location.
Note that if the CAS fails, this means that the list has been updated concurrently by other
thread(s) and the traversal must start again to find the correct place to insert.
Contrast this with the “spin lock” approach for claiming a semaphore (mutex lock).
In this case the “read-and-clear” repeats until it succeeds (ref slide 5).
How would you program an ordered list using semaphores? Lock the whole list?
H 10 30 T
CAS (address, old, new) could be used to change node.next in the head to point to 30,
after checking the old value points to 10 (so there were no concurrent inserts between H and 10)
But concurrent threads could have inserted values between 10 and 30, after 30 was selected
for the new pointer from H.
Those inserts would be lost:
H 10 30 T
20
deleted
lost update
H 10 X 30 T
H 10 X 30 T
The algorithms are given in C++ - like pseudo-code in the paper, as is a proof of correctness
ar = ar-1 aw = aw-1
rr = rr-1 ww = ww-1
if rr = 0 if aw = 0
then signal waiting writers then signal waiting readers
exit exit
wait (CountGuard-sem)
ar = ar+1
if aw=0 then rr = rr+1
else wait (Rsem) deadlock! blocking while holding a semaphore
signal (CountGuard-sem)