OsChapter_2
OsChapter_2
2
Solutions
2.2 f = g+h+i
2.5
Little-Endian Big-Endian
Address Data Address Data
3 ab 3 12
2 cd 2 ef
1 ef 1 cd
0 12 0 ab
2.6 2882400018
2.8 f = 2*(&A)
addi $t0, $s6, 4
add $t1, $s6, $0
sw $t1, 0($t0)
lw $t0, 0($t0)
add $s0, $t1, $t0
2.9
type opcode rs rt rd immed
addi $t0, $s6, 4 I-type 8 22 8 4
add $t1, $s6, $0 I-type 0 22 0 9
sw $t1, 0($t0) I-type 43 8 9 0
lw $t0, 0($t0) I-type 35 8 8 0
add $s0, $t1, $t0 R-type 0 9 8 16
opcode,
type funct3,7 rs1 rs2 rd Imm
addi x30. X 10.8 I-type 0x13, 0x0, – 10 – 30 8
addi x31. X 10.0 R-type 0x13, 0x0, – 10 – 31 0
sd x31. 0(x30) S-type 0x23, 0x3, – 31 30 – 0
ld x30. 0(x30) I-type 0x3, 0x3, – 30 – 30 0
add x5. X30. x31 R-type 0x33, 0x0, 0x0 30 31 5 –
2.10
2.10.1 0x50000000
2.10.2 overflow
2.10.3 0xB0000000
2.10.4 no overflow
2.10.5 0xD0000000
2.10.6 overflow
2.11
2.11.1 There is an overflow if 128 + $s1 > 231 − 1.
In other words, if $s1 > 231 − 129.
There is also an overflow if 128 + $s1 < − 231.
In other words, if $s1 < − 231 − 128 (which is impossible given the range
of $s1).
2.11.2 There is an overflow if 128 – $s1 > 231 − 1.
In other words, if $s1 < − 231 + 129.
There is also an overflow if 128 – $s1 < − 231.
In other words, if $s1 > 231 + 128 (which is impossible given the range
of $s1).
2.16
2.16.1 The opcode would expand from 7 bits to 9.
The rs1, rs2 , and rd fields would increase from 5 bits to 7 bits.
2.16.2 The opcode would expand from 7 bits to 9.
Thers1 and rd fields would increase from 5 bits to 7 bits. This change
does not affect the imm field per se, but it might force the ISA designer to
consider shortening the immediate field to avoid an increase in overall
instruction size.
2.16.3 * Increasing the size of each bit field potentially makes each instruction
longer, potentially increasing the code size overall.
* However, increasing the number of registers could lead to less register
spillage, which would reduce the total number of instructions, possibly
reducing the code size overall.
2.17
2.17.1 0xBABEFEF8
2.17.2 0xAAAAAAA0
2.17.3 0x00005545
2.21 $t2 = 3
2.22
2.22.1 [0x20000000,2FFFFFFC]
2.22.2 [1FFE0000,2001FFFC]
2.23
2.23.1 The I-type instruction format would be most appropriate because it would
allow the branch target to be encoded into the immediate field and the
register address to be encoded into the rt field, whose datapath supports
both read and write access the register file.
2.24
2.24.1 The final value of $s2 is 20.
2.24.2 i = 10;
do {
B += 2;
i = i – 1;
} while (i > 0)
2.24.3 5*N instructions.
2.26 Answers will vary, but should require approximately 14 MIPS instructions,
and when a = 10 and b = 1, should result in approximately 158 instructions
being executed.
2.32 We can use the tail-call optimization for the second call to func, but then
we must restore $ra, $s0, $s1, and $sp before that call. We save only one
instruction (jr $ra).
2.33 Register $ra is equal to the return address in the caller function, registers
$sp and $s3 have the same values they had when function f was called, and
register $t5 can have an arbitrary value. For register $t5, note that although
our function f does not modify it, function func is allowed to modify it so we
cannot assume anything about the of $t5 after function func has been called.
2.35
2.35.1 0x11
2.35.2 0x44
2.37 setmax:
ll $t0,0($a0) # $t0 = *shvar
sub $t1,$t0,$a1 # $t1 = *shvar-x
bgez $t1,skip # if result is zero or positive, then
x <= *shvar, so keep original value of *shvar, else
replace with x
add $t0,$0,$a1
skip:
sc $t0,0($a0) # try to store result to shvar
beqz $t0,setmax # repeat if operation was not atomic
2.38 When two processors A and B begin executing this routine at the same
time, at most one of them will execute the store-conditional instruction
successfully, while the other will be forced to retry the operation. If processor
A’s store-conditional successds initially, then B will re-enter the try block,
and it will see the new value of shvar written by A when it fi nally succeeds.
The hardware guarantees that both processors will eventually execute the
code completely.
2.39
2.39.1 No. The resulting machine would be slower overall.
Current CPU requires (num arithmetic * 1 cycle) + (num load/store
* 10 cycles) + (num branch/jump * 3 cycles) = 500 * 106 * 1 + 300 * 106
* 10 + 100 * 106 * 3 = 3800 * 106 cycles.
The new CPU requires (.75 * num arithmetic * 1 cycle) + (num load/store
* 10 cycles) + (num branch/jump * 3 cycles) = 375 * 106 * 1 + 300 * 106
* 10 + 100 * 106 * 3 = 3675 * 106 cycles.
However, given that each of the new CPU’s cycles is 10% longer than the
original CPU’s cycles, the new CPU’s 3675 * 106 cycles will take as long as
4042.5 * 106 cycles on the original CPU.
2.39.2 If we double the performance of arithmetic instructions by reducing their
CPI to 0.5, then the the CPU will run the reference program in (500 * .5) +
(300 * 10) + 100 * 3 = 3550 cycles. This represents a speedup of 1.07.
If we improve the performance of arithmetic instructions by a factor of
10 (reducing their CPI to 0.1), then the the CPU will run the reference
program in (500 * .1) + (300 * 10) + 100 *3 = 3350 cycles. This represents
a speedup of 1.13.
2.40
2.40.1 Take the weighted average: 0.7 * 2 + 0.1 * 6 + 0.2 * 3 = 2.6
2.40.2 For a 25% improvement, we must reduce the CPU to 2.6 * .75 = 1.95.
Thus, we want 0.7 * x + 0.1 * 6 + 0.2 * 3 < = 1.95. Solving for x shows
that the arithmetic instructions must have a CPI of at most 1.07.
2.40.3 For a 50% improvement, we must reduce the CPU to 2.6 * .5 = 1.3. Thus,
we want 0.7 * x + 0.1 * 6 + 0.2 * 3 < = 1.3. Solving for x shows that the
arithmetic instructions must have a CPI of at most 0.14