0% found this document useful (0 votes)

91 views12 pages

Consistency vs. Coherence: Example: Two Processors Are Synchronizing On A Variable Called

The document discusses the differences between coherence and consistency in parallel computing. Coherence ensures that writes by one processor are visible to other processors, while consistency ensures writes to different locations are seen in an order that makes sense given the program order. It then analyzes examples using different memory models to determine if they have coherence or consistency issues. It also describes snooping cache coherence protocols like MSI and the Illinois protocol, explaining their state transitions and requirements for coherence and consistency.

Uploaded by

Sambhav Verman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views12 pages

Consistency vs. Coherence: Example: Two Processors Are Synchronizing On A Variable Called

Uploaded by

Sambhav Verman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Consistency vs.

Coherence
[5.2] In the preceding two lectures, we have studied coherence and consistency. Whats the difference? Coherence assures that values written by one processor are read by other processors. However, coherence says nothing about when writes will become visible. Another way of looking at it: Coherence insures that writes to a particular location will be seen in order. Consistency insures that writes to different locations will be seen in an order that makes sense, given the source code. Example: Two processors are synchronizing on a variable called flag. Assume A and flag are both initialized to 0. P1 A = 1; flag = 1; P2 while (flag == 0); /* spin */ print A;

What value does the programmer expect to be printed? What could go wrong? Is this a coherence problem? Is this a consistency problem? Note that even barrier synchronization cant guarantee the right order.

Lecture 14

Architecture of Parallel Computers

P1 A = 1; barrier(b1)

barrier(b1) print A;

There are two problems with using this approach to guarantee that A is printed as 1. A barrier just guarantees that processes will wait there until all have arrived. It doesnt guarantee that processes wait

A barrier may be implemented with reads and writes to shared variables. Nothing makes all processors wait until these reads and writes are performed everywhere. Requirements for sequential consistency Sequential consistency assures that each process appears to issue and complete memory operations one at a time, and in sequential order.
Processors P1 issuing memory references as per program order

The switch is randomly set after each memory reference

Memory

Sequential consistency requires

Lecture 14

Architecture of Parallel Computers

memory operations must become visible to all processes in program order, and operations must appear atomicone operation must complete before the next one is issued. It is tricky to make writes seem atomic, especially when multiple copies of a word need to be updated on each write. If writes arent atomic, errors may result. P1 A = 1; P2 while (A == 0); B = 1; P3 while (B == 0); print A;

If writes arent atomic, what might happen with the print statement? Bus-based serialization of transactions provides not only coherence, but also consistency. With the two-state write-through invalidation protocol discussed in Lecture 12, writes and read misses to all locations are serialized in bus order. If a read obtains the value of write W, W is guaranteed to have completedsince it caused a bus transaction. When write W is performed wrt. any processor, all previous writes in bus order have completed. Fortunately, not all operations must be performed in order. Why fortunately? Of these operations, which must be done in order? Write A Write B Read A Read B

Lecture 14

Architecture of Parallel Computers

Snooping coherence protocols

[5.3] Snooping protocols are simple to implement. No need to change processor, main memory, or cache. Only the cache controller need be extended. They are usable with write-back caches. Processors can write to their caches concurrently without any bus transactions. If a block in a cache is dirty, how many caches may that block be cached in? A cache is said to own a block if it must supply the data upon a request for the block. We will study three protocols. A three-state write-back invalidation protocol (MSI). A four-state write-back invalidation protocol (Illinois). A four-state write-back update protocol (Dragon). A three-state invalidation protocol [5.3.1] Our write-through protocol had two states, In a write-back cache, we must distinguish clean (unmodified) blocks from dirty (modified) blocks. This means we need at least three states. Is block in cache modified? Modified Main memory up to date? Other caches have copies of block?

Lecture 14

Architecture of Parallel Computers

Shared Invalid What has to happen before a shared or invalid block can be written and placed in the modified state? In this system, five different types of transactions are possible. Processor read (PrRd). Processor write (PrWr) Either of these might miss. In case of a miss, a block needs to be replaced. In addition to processor transactions, there are three kinds of bus transactions: Bus read (BusRd). This occurs when a PrRd misses in the cache. A request goes out over the bus to get the value of the block. Bus read exclusive (BusRdX).

Bus Write-Back (BusWB). This is generated by

Which of these transactions is required only because of cache coherence? Let us take a look at the state-transition diagram for this protocol.

Lecture 14

Architecture of Parallel Computers

PrRd/

PrWr/

There are three states. The states closer to the top of the diagram are bound more tightly to the processor. Each transition is labeled with the transaction that causes it and, if applicable, the transaction that it generates.

BusRd/Flush

PrWr/BusRdX S BusRdX/Flush BusRdX/ PrRd/BusRd PrRd/ BusRd/ PrWr/BusRdX

Let us first consider the transitions out of the Invalid state. What happens if the processor reads a word? What happens if this processor writes a word?

How many transitions are there out of the Shared state? One of these transactions is a write transaction. What happens then?

Since the cache already has a copy of the block, the only effect is to invalidate any copies in other caches. The data coming back on the bus can be ignored, since it is already in the cache. (In fact, a BusUpgr transaction could be introduced to reduce bus traffic.)

Lecture 14

Architecture of Parallel Computers

The rest of the transactions are read transactions. If reads the line, the block stays in the Shared state. If is invalidated (moves into the Invalid state). , the block

How many transitions are there out of the Modified state? One of these transactions is a write transaction. What happens then? What is the simplest read transaction? What about bus reads (reads coming from other processors)?

When a block transitions from the Modified to the Shared state, it is said to have been demoted. In this case, the memory and the requesting cache both grab a copy of the block from the bus. Actually, for the sake of completeness, we should have a transistion from each state for each observable transaction. Sometimes we specify that nothing happens for these transactions (as in the state); sometimes we just omit the transaction (as in the state). Given this diagram from Lecture 12, what would be the action and the state of the data in each cache?

Lecture 14

Architecture of Parallel Computers

P1 u=? $ u:5 4

P2 u=? $ 5

P3 u=7 $ u:5 3

1 u:5 Memory 2

I/O devices

Processor action P1 reads u P3 reads u P3 writes u P1 reads u P2 reads u

State in P1

State in P2

State in P3

Bus action

Data supplied by

To achieve coherence, this protocol must Propagate all writes so they become visible by other processes. Serialize all writes, so they are seen in the same order by other processes.

To achieve sequential consistency, this protocol must

Lecture 14

Architecture of Parallel Computers

make memory operations visible to all processes in program order, and make operations appear atomicone operation must complete before the next one is issued. Obviously, bus transactions are totally ordered. Between bus transactions, processors perform reads and writes in program order. This looks the condition for merged partial orders that we considered in Lecture 12

P0 :

P1 :

P2 :

except that writes may be interspersed among the reads between bus transactions.

P0 :

P1 :

P2 :

However, this is not a problem, because those writes are not observed by other processors (if they were, a bus transaction would be generated). See CS&G, p. 298 for a more complete discussion.

Lecture 14

Architecture of Parallel Computers

The four-state invalidation protocol [5.3.2] The three-state protocol generates more bus transactions than necessary. If data is read, then modified, two bus transactions are generated. A A transaction (when the block moves from state to state ). transaction (when the block moves from state to state ).

The second transaction is generated even by serial programs, is needless, and wastes bus bandwidth. How can we avoid it? Lets call this state E, for exclusive. It indicates that a block is in use by a single processor, but has not been modified. Then if the block is modified, no bus transaction is necessary. If the block is read by another processor, main memory can supply the value (the block is unowned). The resulting protocol is used by several modern processors.

Here is a state-transition diagram of this protocol.

Lecture 14

Architecture of Parallel Computers

PrRd PrWr/ M BusRdX/Flush

BusRd/Flush PrWr/ PrWr/BusRdX E BusRd/ Flush PrRd/ PrW r/BusRdX S

When a block is first read by a processor, it enters E or S state, depending on whether another cache has a copy. When a processor writes a block that is in state E, it immediately transitions to state without a bus transaction.

BusRdX/Flush

BusRdX/Flush PrRd/ BusRd (S) PrRd/ BusRd/Flush PrRd/ BusRd(S) I

What new requirement does this protocol place on the bus?

If a block is in state S, does that mean it is actually being shared? Who should provide the block for a BusRd transaction when both main memory and another cache have a copy of it? Originally, in the Illinois protocol, the cache supplied it. Since caches were constructed out of SRAM, they were assumed to be faster than DRAM. However, in modern machines, it may be more expensive to access another processors cache.

Lecture 14

Architecture of Parallel Computers

How does main memory know it should supply data?

If the data resides in multiple caches, But, on a distributed-memory machine, it may still be faster to get it from the cache. If it is faster to transfer data from the cache, it may be a good idea to add a fifth owned (O) state identifying the owner, who is responsible for supplying requests for the data.

Lecture 14

Architecture of Parallel Computers

UVM Questions and Answers
100% (1)
UVM Questions and Answers
10 pages
UVM Questions and Answers
100% (1)
UVM Questions and Answers
10 pages
CSE211 MCQ's
100% (1)
CSE211 MCQ's
17 pages
C2. Random Walk Shoes
No ratings yet
C2. Random Walk Shoes
1 page
04 Coherence
No ratings yet
04 Coherence
74 pages
Cache Coherence: Part I: CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)
No ratings yet
Cache Coherence: Part I: CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)
31 pages
Cache Coherence (Part 1)
No ratings yet
Cache Coherence (Part 1)
13 pages
2.4.6 Cache Coherence in Multiprocessor Systems
No ratings yet
2.4.6 Cache Coherence in Multiprocessor Systems
3 pages
Lecture12 PDF
No ratings yet
Lecture12 PDF
9 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Chapter 4: Multiprocessor: Dr. Eng. Amr T. Abdel-Hamid Spring 2011
No ratings yet
Chapter 4: Multiprocessor: Dr. Eng. Amr T. Abdel-Hamid Spring 2011
22 pages
MN Cache Coherence
No ratings yet
MN Cache Coherence
11 pages
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
No ratings yet
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
54 pages
Cs 6461 Computer Architecture Lecture 11
No ratings yet
Cs 6461 Computer Architecture Lecture 11
51 pages
Multiprocessor Cache Coherence
No ratings yet
Multiprocessor Cache Coherence
13 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Unit 4 - Advanced Computer Architecture - Www.rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - Www.rgpvnotes.in
14 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Multiprocessors and Thread
No ratings yet
Multiprocessors and Thread
4 pages
Cache Coherence and Synchronization - Tutorialspoint
No ratings yet
Cache Coherence and Synchronization - Tutorialspoint
7 pages
2.Symmetric Shared Memory Architectures
No ratings yet
2.Symmetric Shared Memory Architectures
12 pages
Coherence
No ratings yet
Coherence
16 pages
Snoop-Based Multiprocessor Design
No ratings yet
Snoop-Based Multiprocessor Design
57 pages
Cache Coherence: CEG 4131 Computer Architecture III Slides Developed by Dr. Hesham El-Rewini
No ratings yet
Cache Coherence: CEG 4131 Computer Architecture III Slides Developed by Dr. Hesham El-Rewini
63 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
46 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Slides Guerraoui
No ratings yet
Slides Guerraoui
153 pages
MODULE 4 hpc
No ratings yet
MODULE 4 hpc
41 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Cache Coherence
No ratings yet
Cache Coherence
63 pages
CSA Mod 3-Part 2 Notes (Cache Coherence)
No ratings yet
CSA Mod 3-Part 2 Notes (Cache Coherence)
19 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Sequential Consistency and Cache Coherence Protocols: Computer Science and Artificial Intelligence Lab M.I.T
No ratings yet
Sequential Consistency and Cache Coherence Protocols: Computer Science and Artificial Intelligence Lab M.I.T
29 pages
First Hop Redundancy Protocol: Network Redundancy Protocol
From Everand
First Hop Redundancy Protocol: Network Redundancy Protocol
Mulayam Singh
No ratings yet
The MESI Protocol
100% (1)
The MESI Protocol
4 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
Ch-7 Cache Coherence and Synchronization
No ratings yet
Ch-7 Cache Coherence and Synchronization
20 pages
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
No ratings yet
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
21 pages
Cache Coherence: - According To Webster's Dictionary
No ratings yet
Cache Coherence: - According To Webster's Dictionary
15 pages
Unit 5
No ratings yet
Unit 5
89 pages
HW4S24 - Sol
No ratings yet
HW4S24 - Sol
11 pages
Week 5
No ratings yet
Week 5
52 pages
SPANNING TREE PROTOCOL: Most important topic in switching
From Everand
SPANNING TREE PROTOCOL: Most important topic in switching
Mulayam Singh
No ratings yet
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
23 pages
Cpu Bus
No ratings yet
Cpu Bus
31 pages
Lec 26
No ratings yet
Lec 26
28 pages
(D) Global Optimization and Local Optimization of Compliers
No ratings yet
(D) Global Optimization and Local Optimization of Compliers
4 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Parallel Cache Coherence
No ratings yet
Parallel Cache Coherence
14 pages
7.2 Memory Coherency and Protocol: 24593-Rev. 3.12-September 2006 AMD64 Technology
No ratings yet
7.2 Memory Coherency and Protocol: 24593-Rev. 3.12-September 2006 AMD64 Technology
4 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Muge - Snoop Based Multiprocessor Design
No ratings yet
Muge - Snoop Based Multiprocessor Design
32 pages
Cache Coherence: CSE 661 - Parallel and Vector Architectures
No ratings yet
Cache Coherence: CSE 661 - Parallel and Vector Architectures
37 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Parallelism Joins Concurrency For Multicore Embedded Computing
No ratings yet
Parallelism Joins Concurrency For Multicore Embedded Computing
6 pages
Snoop-Based Multiprocessor Design I: Multiprocessor Design I: Base Design
No ratings yet
Snoop-Based Multiprocessor Design I: Multiprocessor Design I: Base Design
6 pages
comporg6_ch12
No ratings yet
comporg6_ch12
36 pages
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
SoC Design - A Review
No ratings yet
SoC Design - A Review
131 pages
FPGA Design Methodologies
No ratings yet
FPGA Design Methodologies
9 pages
Document
No ratings yet
Document
8 pages
Crossing The Synch Asynch
No ratings yet
Crossing The Synch Asynch
15 pages
Logical Efforts
No ratings yet
Logical Efforts
223 pages
User Guide 20
No ratings yet
User Guide 20
212 pages
Odd Counters 50 P
No ratings yet
Odd Counters 50 P
8 pages
Boundary-Scan DFT Guidelines
100% (1)
Boundary-Scan DFT Guidelines
63 pages
Cache 3 Associativity Handout
No ratings yet
Cache 3 Associativity Handout
1 page
Cache Coherence Protocols: Evaluation Using A Multiprocessor Simulation Model
No ratings yet
Cache Coherence Protocols: Evaluation Using A Multiprocessor Simulation Model
26 pages
Embedded System Manual Xilinx 14.4
No ratings yet
Embedded System Manual Xilinx 14.4
282 pages
Intel Architecture Manual
No ratings yet
Intel Architecture Manual
1,386 pages
Dynamic Reconfiguration
No ratings yet
Dynamic Reconfiguration
12 pages
Uvm Users Guide 1.1
No ratings yet
Uvm Users Guide 1.1
198 pages
Lowering Power Consumption in Clock by Using Globally Asynchronous Locally Synchronous Design Style
No ratings yet
Lowering Power Consumption in Clock by Using Globally Asynchronous Locally Synchronous Design Style
6 pages
Zynq 7000 TRM
No ratings yet
Zynq 7000 TRM
1,770 pages
CELL Handbook
No ratings yet
CELL Handbook
884 pages
Case Western
No ratings yet
Case Western
4 pages
Hashing
No ratings yet
Hashing
2 pages
Computer Security Ethics and Privacy
No ratings yet
Computer Security Ethics and Privacy
40 pages
AN E P: F - L C L (-L) - K C: EW Ducational Aradigm Rom E Earning To Loud Earning C Earning Nowledge in The Loud
No ratings yet
AN E P: F - L C L (-L) - K C: EW Ducational Aradigm Rom E Earning To Loud Earning C Earning Nowledge in The Loud
27 pages
High Definition Multimedia Player: User Manual
No ratings yet
High Definition Multimedia Player: User Manual
73 pages
M337x 387x 407x Release Note English
No ratings yet
M337x 387x 407x Release Note English
3 pages
Internet
No ratings yet
Internet
2 pages
Websphere Interview Questions
No ratings yet
Websphere Interview Questions
5 pages
3 Asciiquestions
No ratings yet
3 Asciiquestions
1 page
005 00038 0 M1535 PDF
No ratings yet
005 00038 0 M1535 PDF
4 pages
Random Cast An Energy-Efficient Communication Scheme For Mobile Ad Hoc Networks
No ratings yet
Random Cast An Energy-Efficient Communication Scheme For Mobile Ad Hoc Networks
2 pages
Huawei MNP Solution (2008!04!10)
100% (3)
Huawei MNP Solution (2008!04!10)
38 pages
Weintek MT6050iP, MT8050iP
No ratings yet
Weintek MT6050iP, MT8050iP
3 pages
How To Install Oracle Developer Suite 10g and Setup Listener
No ratings yet
How To Install Oracle Developer Suite 10g and Setup Listener
7 pages
LD Lab Manual 2
No ratings yet
LD Lab Manual 2
33 pages
Final Exam in Empowerment Technologies
100% (3)
Final Exam in Empowerment Technologies
4 pages
Register Transfer and Microoperations2017-3-5
No ratings yet
Register Transfer and Microoperations2017-3-5
20 pages
Veritas - Buffer Increasing
No ratings yet
Veritas - Buffer Increasing
6 pages
Mitel 5312 IP Phone - Guide
No ratings yet
Mitel 5312 IP Phone - Guide
2 pages
BLADE V7 LITE Upgrade Guide (Through An SD Card)
No ratings yet
BLADE V7 LITE Upgrade Guide (Through An SD Card)
5 pages
Tuning 2G Optim
No ratings yet
Tuning 2G Optim
11 pages
Terminal Overview Systec
No ratings yet
Terminal Overview Systec
2 pages
Wifi Integration in Evolution To 5G Networks: Wifi Knowledge Summit, Bangalore March 9, 2018
No ratings yet
Wifi Integration in Evolution To 5G Networks: Wifi Knowledge Summit, Bangalore March 9, 2018
15 pages
Openfep Ps
No ratings yet
Openfep Ps
2 pages
Unit - I - Chapter - 1 - Notes-Distributed Systems
No ratings yet
Unit - I - Chapter - 1 - Notes-Distributed Systems
14 pages
THL T200 Frameroot
No ratings yet
THL T200 Frameroot
4 pages
Wireless Radius Server Windows Elektron
No ratings yet
Wireless Radius Server Windows Elektron
2 pages
Packet Sniffers: Media". Sharing Means That Computers Can Receive Information That Was Intended For
No ratings yet
Packet Sniffers: Media". Sharing Means That Computers Can Receive Information That Was Intended For
42 pages
Hardware and Method of Installation & Commissioning Nokia Siemens
71% (7)
Hardware and Method of Installation & Commissioning Nokia Siemens
41 pages

Consistency vs. Coherence: Example: Two Processors Are Synchronizing On A Variable Called

Uploaded by

Consistency vs. Coherence: Example: Two Processors Are Synchronizing On A Variable Called

Uploaded by

Consistency vs.

Architecture of Parallel Computers

The switch is randomly set after each memory reference

Sequential consistency requires

Architecture of Parallel Computers

Architecture of Parallel Computers

Snooping coherence protocols

Architecture of Parallel Computers

Bus Write-Back (BusWB). This is generated by

Architecture of Parallel Computers

PrWr/BusRdX S BusRdX/Flush BusRdX/ PrRd/BusRd PrRd/ BusRd/ PrWr/BusRdX

Architecture of Parallel Computers

Architecture of Parallel Computers

Processor action P1 reads u P3 reads u P3 writes u P1 reads u P2 reads u

To achieve sequential consistency, this protocol must

Architecture of Parallel Computers

Architecture of Parallel Computers

Here is a state-transition diagram of this protocol.

Architecture of Parallel Computers

PrRd PrWr/ M BusRdX/Flush

BusRd/Flush PrWr/ PrWr/BusRdX E BusRd/ Flush PrRd/ PrW r/BusRdX S

BusRdX/Flush PrRd/ BusRd (S) PrRd/ BusRd/Flush PrRd/ BusRd(S) I

What new requirement does this protocol place on the bus?

Architecture of Parallel Computers

How does main memory know it should supply data?

Architecture of Parallel Computers

You might also like