0% found this document useful (0 votes)
14 views

Unit-6

Uploaded by

Krishil Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Unit-6

Uploaded by

Krishil Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Superscalar processors and

multicore systems
Instruction Level Parallelism
Instruction Level Parallelism (ILP) is used to refer to the architecture in
which multiple operations can be performed parallelly in a particular
process, with its own set of resources – address space, registers,
identifiers, state, and program counters.

It refers to the compiler design techniques and processors designed to


execute operations, like memory load and store, integer addition, and
float multiplication, in parallel to improve the performance of the
processors.
What is Instruction Level Parallelism?
• Instruction-level parallelism can also appear explicitly in the
instruction set. VLIW (Very Long Instruction Word) machines have
instructions that can issue multiple operations in parallel. The Intel
IA64 is a well-known example of such an architecture. All high-
performance, general-purpose microprocessors also include
instructions that can operate on a vector of data at the same time.
Compiler techniques have been developed to generate code
automatically for such machines from sequential programs.
Instruction Level Parallelism (ILP) Architecture
• Instruction Level Parallelism is achieved when multiple operations are
performed in a single cycle, which is done by either executing them
simultaneously or by utilizing gaps between two successive operations
that are created due to the latencies.
Classification of ILP Architectures
The classification of ILP architectures can be done in the following
ways –

• Sequential Architecture: Here, the program is not expected to


explicitly convey any information regarding parallelism to hardware,
like superscalar architecture.
• Dependence Architectures: Here, the program explicitly mentions
information regarding dependencies between operations like dataflow
architecture.
• Independence Architecture: Here, programme m gives information
regarding which operations are independent of each other so that they
can be executed instead of the ‘nops.
Advantages of Instruction-Level Parallelism
• Improved Performance: ILP can significantly improve the performance of
processors by allowing multiple instructions to be executed simultaneously or out-
of-order. This can lead to faster program execution and better system throughput.
• Efficient Resource Utilization: ILP can help to efficiently utilize processor
resources by allowing multiple instructions to be executed at the same time. This
can help to reduce resource wastage and increase efficiency.
• Reduced Instruction Dependency: ILP can help to reduce the number of
instruction dependencies, which can limit the amount of instruction-level
parallelism that can be exploited. This can help to improve performance and
reduce bottlenecks.
• Increased Throughput: ILP can help to increase the overall throughput of
processors by allowing multiple instructions to be executed simultaneously or out-
of-order. This can help to improve the performance of multi-threaded applications
and other parallel processing tasks.
Disadvantages of Instruction-Level Parallelism
Increased Complexity: Implementing ILP can be complex and requires
additional hardware resources, which can increase the complexity and cost of
processors.
Instruction Overhead: ILP can introduce additional instruction overhead,
which can slow down the execution of some instructions and reduce
performance.
Data Dependency: Data dependency can limit the amount of instruction-level
parallelism that can be exploited. This can lead to lower performance and
reduced throughput.
Reduced Energy Efficiency: ILP can reduce the energy efficiency of processors
by requiring additional hardware resources and increasing instruction overhead.
This can increase power consumption and result in higher energy costs.
SMT (Simultaneous Multithreading)

SMT stands for Simultaneous Multithreading. It is a strategy for


increasing the overall efficiency of superscalar CPUs using hardware
multithreading. It enables several independent execution threads to use
the resources made available by contemporary processor architectures.
SMT (Simultaneous Multithreading, Continue....)
SMT (Simultaneous Multithreading, Continue....)

• The word "multithreading" is confusing since not only may several


threads be processed concurrently on a single CPU core, but also
many jobs (with various page tables, task state segments, protection
rings, I/O permissions, and so on). Despite sharing the same core, they
are entirely distinct operations. Pre-emptive multitasking is
conceptually related to multithreading. Nonetheless, it is implemented
at the thread level in current superscalar CPUs.
SMT (Simultaneous Multithreading, Continue....)

• Simultaneous multithreading is one of two significant forms of multithreading, the


other being temporal multithreading. Only one thread of instructions can run at
any pipeline stage at any given moment in temporal multithreading. Multiple
threads can be implemented at any pipeline level during multithreading. This is
accomplished without requiring significant changes to the fundamental processor
architecture: the principal enhancements required are the ability to collect
instructions from many threads in a cycle and a bigger register file to carry data
from several lines. The supported number of concurrent threads is determined by
chip designers. The most frequent number of concurrent threads per CPU core is
two. Some processors, however, support up to eight simultaneous threads per core.
Cache Coherence
• In a multiprocessor system, data inconsistency may occur among
adjacent levels or within the same level of the memory hierarchy. In a
shared memory multiprocessor with a separate cache memory for each
processor, it is possible to have many copies of any one instruction
operand: one copy in the main memory and one in each cache memory.
When one copy of an operand is changed, the other copies of the
operand must be changed also. Example : Cache and the main memory
may have inconsistent copies of the same object.
Cache Coherence
Cache Coherence
Suppose there are three processors, each having cache. Suppose the
following scenario:-

Processor 1 read X : obtains 24 from the memory and caches it.


Processor 2 read X : obtains 24 from memory and caches it.
Again, processor 1 writes as X : 64, Its locally cached copy is updated.
Now, processor 3 reads X, what value should it get?
Memory and processor 2 thinks it is 24 and processor 1 thinks it is 64.
Cache Coherence
As multiple processors operate in parallel, and independently multiple caches may
possess different copies of the same memory block, this creates a cache coherence
problem. Cache coherence is the discipline that ensures that changes in the values of
shared operands are propagated throughout the system in a timely fashion. There are
three distinct level of cache coherence :-

• Every write operation appears to occur instantaneously.


• All processors see exactly the same sequence of changes of values for each
separate operand.
• Different processors may see an operation and assume different sequences of
values; this is known as non-coherent behavior.

You might also like