The Pentium 4 processor utilizes the Intel NetBurst micro-architecture, designed for high performance with features like hyper-pipelined technology, rapid execution engine, and aggressive branch prediction. It includes a three-section pipeline and Hyper-Threading Technology, allowing two logical processors to operate simultaneously on a single core. Additionally, the processor employs a Translation Lookaside Buffer (TLB) to enhance memory access speed, although it can experience performance issues due to cache thrashing.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0 ratings0% found this document useful (0 votes)
14 views8 pages
Q10,12
The Pentium 4 processor utilizes the Intel NetBurst micro-architecture, designed for high performance with features like hyper-pipelined technology, rapid execution engine, and aggressive branch prediction. It includes a three-section pipeline and Hyper-Threading Technology, allowing two logical processors to operate simultaneously on a single core. Additionally, the processor employs a Translation Lookaside Buffer (TLB) to enhance memory access speed, although it can experience performance issues due to cache thrashing.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 8
@ The Pentium 4 processor is the first hardware implementation of
new micro-architecture, the Intel NetBurst micro-architecture.
@ The intel NetBurst micro-architecture is designed to achieve high
performance for both integer and floating-point computations at
very high clock rates.
@ it has the following features:
1. Hyper pipelined technology to enable high clock rates and
frequency headroom to well above 1GHz.
2. Rapid execution engine to reduce the latency of basic integer _
instructions.
3. High-performance, quad-pumped bus interface to the 400
NetBurst micro-architecture system bus. =
Execution trace cache to shorten branch delays.
Cache line sizes of 64 and 128 bytes.
Hardware prefetch
Aggressive branch prediction to minimize pipeline delays.
Out-of-order speculative execution to enable parallelism.
PP New»
Superscalar issue to enable parallelism.
10. Hardware register renaming to avoid register name space
limitations.Intel® NetBuurst™ Micro — architecture
@ ‘Me pipeline of the Intel NetBurst micro-architecture contain three
sections:
1. The in-order issue front end
2. The out-of-order superscalar execution core
3. The in-order retirement unit. =
B the front end of the Intel NetBurst micro-architecture consists of 4
two parts:
1. Fetch/decode unit
2. Execution trace cache.(@ The front end performs several basic functions:
BB Prefetches IA-32 instructions that are likely to be executed.
B Fetches instructions that have not already been prefetched. a
B® Decodes instructions into pops.
B Generates microcode for complex instructions and
special-purpose code.
B® Delivers decoded instructions from the execution trace cache.
B Predicts branches using highly advanced algorithm.
in enabling parallelism.
{J This feature enables the processor to reorder instructions #
‘one pop is delayed while waiting for data or a contended resource,
other uops that appear later in the program order may proceed
around it.
0 The processor employs several buffers to smooth the flow of pops.
(B® the core is designed to facilitate parallel execution.
pipeline.
4] A number of arithmetic logical unit (ALU) instructions can st
per cycle, and many floating point instructions can start one every
two cycles.
data inputs are ready and resources are available.@ The retirement section receives the results of the executed pops:
from the execution core and processes.
@ For semantically-correct execution, the results of IA-32
must be committed in original program order before it is retired.
@ Exceptions may be raised as instructions are retired.
@ ms, exceptions cannot occur speculatively, they occur in the
correct order, and the machine can be correctly restarted after
an exception. :
@ when a pop completes and writes its result to the d
it is retired.
_@ Up to three pops may be retired per cycle.
$8] The Reorder Buffer (ROB) is the unit in the processor wh
completed ops, updates the architectural state in order,
(Giew Fireang Tecttony)
@ xyper-Threading Technology is a form of simultaneous E
multithreading technology introduced by Intel, {
G A processor with Hyper-Threading Technology consists of two
logical processors per core, each of which has its own processor
architectural state.
@® Each logical processor can be individually halted, interrupted or
directed to execute a specified thread, independently from the
other logical processor sharing the same physical core.@ Unlike a traditional dual-processor configuration that uses two
separate physical processors, the logical processors in a
hyper-threaded core share the execution resources.
gu These resources include the execution engine, caches, and system
bus interface;
g@ The sharing of resources allows two logical processors to work with
each other more efficiently, and
(@ Hyper-threading works by duplicating certain sections of the
Processor - those that store the architecture state - but not
duplicating the main execution resources.
B This technology is transparent to operating systems and pro
B itis possible to optimize operating system behavior on
multi-processor hyper-threading capable systems.
{) Software applications that have been written to use mu
of code called "threads" view the Pentium 4 processor at 3.06
with HT Technology as two processors.
g HT Technology allows the processor to work on two separate thn
at the same time rather than one at a time.GB A translation ookeside buffer (TLB) is a memory cache thats used
to reduce the time taken to access a user memory location.
@ A transtation lookaside busier (TLB) is a memory cache that stores.
recent translations of virtual memory to physical addresses for
faster retrieval.
GH A TLB may reside between the CPU and the CPU cache, between
CPU cache and the main memory or between the different levels
of the multi-level cache.
G The majority of desktop, laptop, and server processors include one
or more TLBs in the memory-management hardware, and 2@ Itis nearly always present in any processor that utilizes paged or
segmented virtual memory.
@ when a virtual memory address is referenced by a program, the
search starts in the CPU. First, instruction caches are checked.
i
(Bf tthe required memory is not in these very fast caches, the system
has to look up the memory’s physical address.
GB At this point, TLB is checked for a quick reference to the location
in physical memory.
{3 When an address is searched in the TLB and not found, the
physical memory must be searched with a memory page crawl
operation.
@ As virtual memory addresses are translated, values referenced are
added to TLB.
OB when a value can be retrieved from TLB, speed is enhanced
because the memory address is stored in the TLB on processor.
TBs can suffer performance issues from multitasking and code
errors.
@ This performance degradation is called a cache thrash.
{Cache thrash is caused by an ongoing computer activity that fails
to progress due to excessive use of resources or conflicts in the
caching system.