0% found this document useful (0 votes)
56 views1 page

Pipe

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
56 views1 page

Pipe

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
Single-Cycle versus Pipelined Performance To make this discussion concrete, let's create a pipeline. In this example, and in the rest of this chapter, we limit our attention to seven instructions: load word (Iw), store word (sw), add (add), subtract (sub), AND (and), OR (or), and branch if equal (beq). Contrast the average time between instructions of a single-cycle implementation, in which all instructions take one clock cycle, toa pipelined implementation. Assume that the operation times for the major functional units in this example are 200 ps for memory access for instructions or data, 200 ps for ALU operation, and 100 ps for register file read or write. In the single-cycle model, every instruction takes exactly one clock cycle, so the clock cycle must be stretched to accommodate the slowest instruction. Figure 4.28 shows the time required for each of the seven instructions. The single-cycle design must allow for the slowest instruction—in Figure 4.28 it is ]w—so the time required for every instruction is 800 ps. Similarly to Figure 4.27, Figure 4.29 compares nonpipelined and pipelined execution of three load register instructions. Thus, the time between the first and fourth instructions in the nonpipelined design is 3 x 800 ps or 2400 ps. All the pipeline stages take a single clock cycle, so the clock cycle must be long enough to accommodate the slowest operation. Just as the single-cycle design must take the worst-case clock cycle of 800ps, even though some instructions can be as fast as 500ps, the pipelined execution clock cycle must have the worst-case clock cycle of 200ps, even though some stages take only 100ps. Pipelining still offers a fourfold performance improvement: the time between the first and fourth instructions is 3 x 200 ps or 600 ps. ere ees Load word (Iw) 200 ps 100ps__|200ps__|200ps_|100ps _| 800 ps ‘Store word (sw) 200 ps 100ps_|200ps | 200 ps 700 ps Reformat (add, sub, 200 ps 100 ps | 200 ps 100 ps —_| 600 ps and, or) Branch (beq) 200 ps 100 ps__ [200 ps 500 ps. FIGURE 4.28 Total time for each instruction calculated from the time for each component. ‘This calculation assumes that the multiplexors, control unit, PC accesses, and sign extension unit have no delay.

You might also like