0% found this document useful (0 votes)
10 views

Computer Architecture Solutions_OK

Uploaded by

Irfan Memon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Computer Architecture Solutions_OK

Uploaded by

Irfan Memon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

컴퓨터구조 종합시험 예상문제

1. Explain the three types of hazards that hinder the improvement of CPU performance through the pipeline
technique.

Structural Hazards: Structural hazards arise when two instructions in a pipeline require the same hardware
resource at the same time. For example, if two instructions require the use of the same functional unit, such
as an arithmetic logic unit (ALU), a structural hazard occurs. The pipeline cannot proceed with the second
instruction until the resource is free, which can cause delays and reduce the performance improvement of
the pipeline.

Data Hazards: Data hazards occur when an instruction depends on the result of a previous instruction that has
not yet completed. In other words when two instructions in a pipeline refer to the same register and at
least one of them writes to the register. For example, if instruction A writes a value to a register, and
instruction B reads the value from the same register, but instruction B is fetched before instruction A is
completed, a data hazard occurs. This causes the pipeline to stall, waiting for instruction A to complete
before executing instruction B.

Control Hazards: Control hazards arise due to changes in the program flow that affect the instruction being
fetched (when conditional branches interfere with instruction fetches in a pipeline). In other words, control
Hazard arises from the need to make a decision before the condition is completely evaluated. For example,
a branch instruction that changes the program counter (PC) can cause the pipeline to fetch the wrong
instruction. This can lead to wasted cycles as the pipeline must flush instructions and restart from the
correct address, reducing the performance improvement of the pipeline. Branch prediction is a technique
used to mitigate control hazards by predicting the direction of the branch and fetching the correct
instructions in advance.
2. Describe five techniques for reducing the cache miss rates in a hierarchical memory system.

1) Lager block size – compulsory miss => By increasing the block size and increasing the miss penalty,
compulsory miss can be reduced. However, Block size should not be too large.

2) Bigger/Lager cache size – capacity miss => It reduces the capacity miss by increasing the cache size that
can thus hold more data. In this case, however, the access time (hit time) and power consumption are
increased.

3) Cache Pre-fetching: Pre-fetching involves loading data into the cache before it is actually needed by the
processor.

4) Higher associativity – conflict miss => Higher associativity results in a decrease in conflict misses. Thereby, it
helps in reducing the miss rate. Moving from a direct-mapped cache to a set-associative or fully associative
cache can reduce conflict misses.
Associativity is a way to ensure that data is cached with n ways, and the higher the associativity, the lower the
miss rate. The 8-way associative has a near-zero concurrent miss rate.

5) Pseudo-associative cache – conflict miss => The Pseudo-associate cache reduces the conflict miss rate by
viewing each n-way as a separate, direct-mapped cache without parallel verification of each n-way.

6) Compiler optimization – conflict miss => Compiler optimization through memory procedure rearrangement
techniques such as Merge Arrays, Loop Interchange, Loop Fusion, and Blocking reduces the percentage of
compound misses.

7) Victim Caches: A small fully associative cache that sits between the main cache and the rest of the memory
hierarchy can store blocks evicted from the main cache, potentially catching frequently re-accessed data that
would otherwise result in a miss.
3. Explain the advantages and disadvantages of a correlating branch predictor that operates based on the
principles shown in the following figure.

Advantages:
Improved Accuracy over Static Prediction: A two-level predictor (Correlating branch predictors) achieves better
accuracy than static (simple or single-level predictors) prediction by considering the past branch behavior.
The 2-bit global branch history table keeps track of the two most recent branch outcomes, which the
predictor can use to make a more informed prediction about the next branch.

Context-Aware Prediction: By using a global history register, the predictor can discern patterns across multiple
branch instructions, allowing it to effectively predict branches in loops, recursive functions, and other
common programming constructs where branch behavior is correlated with the history of previous branches.

Adaptability: Correlating predictors can adapt to different types of branch behavior by capturing the relationship
between branch outcomes rather than treating each branch independently. This makes them suitable for a
wide range of applications and workloads.

Disadvantages:
Complexity and Cost: Correlating predictors are more complex than single-level branch predictors. They require
additional hardware to track and store global branch history and to manage multiple sets of branch
predictor tables, which can increase the cost and power consumption.

Increased Latency: The need to index into a predictor table using both the branch address and the global
history can add latency to the branch prediction process. The prediction has to wait until the global history
bits are known, which could delay the fetch stage of the pipeline.

Size and Scalability: As the number of bits in the global history increases, the number of predictor entries
increases exponentially. This exponential growth can lead to scalability issues and can consume significant
amounts of on-chip cache, potentially impacting other caching behaviors.

Warm-up Time: When a processor starts up, or when a new task begins execution, the branch predictor tables
and history registers are not yet trained, which can lead to lower prediction accuracy until enough branch
outcomes have been observed to "warm up" the predictor.
4. Depict the internal structure of Branch Target Buffer (BTB) and explain its operational principles.
The Branch Target Buffer (BTB) is a component within a CPU that helps improve performance by predicting the
target address of a branch instruction.

Internal Structure:
The BTB is typically implemented as a table containing several entries. Each entry has the following fields:
• Index: Used to identify the specific branch instruction (often derived from the program counter - PC).
• Branch Prediction: This field stores the predicted target address (where the program will jump to if
the branch is taken).
• Valid Bit: Indicates whether the entry is valid (contains a prediction for a recently encountered branch)
or invalid (empty).

Operational Principles:
1. Fetch Stage: During the fetch stage of the instruction
pipeline, the CPU retrieves the next instruction from
memory.
2. BTB Lookup/Indexing: The CPU uses the PC of the
fetched instruction as an index to look up the
corresponding entry in the BTB. Stored at this index
is the predicted next program counter.
3. Prediction and Execution:
• Valid Entry and Prediction Match: If the entry is
valid and the predicted target address matches
the actual branch target address calculated
during decode, the CPU can directly fetch the
instruction from that predicted address.
• Valid Entry and Prediction Mismatch: If the
entry is valid but the prediction doesn't match,
the CPU fetches the instruction from the
calculated target address and updates the BTB
entry with the correct target address.
• Invalid Entry: If the entry is invalid, the CPU
fetches the instruction from the calculated
target address and creates a new entry in the BTB with the PC, predicted target address, and a
valid bit set.
5. Let's assume there is a 32-bit microprocessor with the same bus cycle period as a 16-bit microprocessor. On
average, 30% of operands and instructions are 32 bits in length, 30% are 16 bits, and 40% are only 8 bits
long. Considering this scenario, calculate the performance improvement that can be achieved by utilizing the
32-bit microprocessor for fetching instructions and operands, compared to a 16-bit microprocessor.

For a 16-bit microprocessor, fetching:


Ÿ A 32-bit operand or instruction takes 2 bus cycles
Ÿ A 16-bit operand or instruction takes 1 bus cycles
Ÿ An 8-bit operand or instruction also takes 1 bus cycle

For a 32-bit microprocessor, fetching:


Ÿ A 32-bit operand or instruction takes 1 bus cycles
Ÿ A 16-bit operand or instruction takes 1 bus cycles
Ÿ An 8-bit operand or instruction also takes 1 bus cycle

For the 16-bit microprocessor, the average number of bus cycles needed per operation is:
Average cycles for 16-bit = (0.3×2)+(0.3×1)+(0.4×1)
= 0.6+0.3+0.4
= 1.3 cycles/operation

For the 32-bit microprocessor, the average number of bus cycles needed per operation is:
Average cycles for 32-bit = (0.3×1)+(0.3×1)+(0.4×1)
= 0.3+0.3+0.4
= 1 cycles/operation

The performance improvement (PI) can be calculated as:


PI=((Bus cycles of 16-bit−Bus cycles of 32-bit)/Bus cycles of 16-bit)×100%
PI=((1.3−1)/1.3)×100%
PI=(0.3/1.3)×100%
PI≈23.08%
6. In a 32-bit CPU using a 4-way set-associative cache with a total size of 64KB and a cache block size of 4B,
calculate the number of bits for Tag, Index, and Offset for an address requested during cache access.

To determine the number of bits for Tag, Index, and Offset in the address being accessed, we first calculate
the number of cache blocks and sets in the cache.

1. Calculate the number of bit for the Offset (O):


Ÿ Since each cache block size = 4B
Ÿ O = log2(Cache block size)
= log2(4)
= 2 bits

2. Calculate the number of bit for the Index (I):


Ÿ Total Cache size = 64KB = 2^16 bytes (64 * 1024B)
Ÿ Number of blocks in cache (B) = Total cache size / Block size
= (2^16) / 4
= 2^14 blocks (16,384)

Since the cache is 4-way set-associative:


Ÿ Number of sets (S) = B / Associativity
= 2^14 / 4
= 2^12 sets (4,096)
Ÿ I = log2(Number of sets)
= log2(4,096)
= 12 bits
With 2^12 sets, the number of bits required for the index is 12 bits.

3. Calculate the number of bit for the Tag (T):


Ÿ The remaining bits from the address belong to the tag.
Ÿ Total address size = 32 bits
Ÿ T = Total address size - (O + I)
= 32 bits - (2 bits + 12 bits)
= 18 bits

Therefore, the bit sizes are:


Tag (T): 18 bits
Index (I): 12 bits
Offset (O): 2 bits

You might also like