Pipeline Datapath & Performance
Transcription
Pipeline Datapath & Performance
CS 2506 Computer Organization II MIPS 3: Forwarding and Hazard Detection You may work in pairs for this assignment. If you choose to work with a partner, make sure only one of you makes a submission a solution and that the file lists names and PIDs for both of you as described in the assignment below. Prepare your answers to the following questions in a plain text file. Submit your file to the Curator system by the posted deadline for this assignment. No late submissions will be accepted. You will submit your answers to the Curator System (www.cs.vt.edu/curator) under the heading MIPS03. For questions 1 through 4, refer to the pipeline design with forwarding, shown below, which supports execution of the any sequence of the following MIPS instructions: add, sub, and, or, slt, and sw, (and lw so long as no stalls are needed to resolve load-use hazards). Remember: this pipeline design does not include (load-use) hazard detection hardware, so it can forward operands but it cannot introduce stalls to deal with situations that forwarding alone will not handle. 3 2 1 CS 2506 Computer Organization II 1. MIPS 3: Forwarding and Hazard Detection Consider the following sequence of MIPS32 assembly instructions (which you may have seen in an earlier assignment): lw add sub sw lw add $t3, $t0, $t1, $t3, $t0, $t4, 0($t0) $t3, $t3 $t0, $t3 4($t0) 0($t1) $t3, $t1 # # # # # # 1.1 1.2 1.3 1.4 1.5 1.6 A data dependency occurs when a later instruction requires an input value that is set by an earlier instruction. A data hazard occurs when one instruction writes a value into a register that will be used as input by a later instruction, but that value does not actually appear in the register by the cycle on which the later instruction attempts to read it. Note that a data hazard always implies a data dependency, but some data dependencies do not imply a data hazard. Also remember that this pipeline design does include hardware for forwarding operands, but not for inserting stalls. a) [8 points] Identify the data hazards that would not prevent the given sequence of instructions from executing correctly on the given hardware design above, even though we do not have the necessary hardware to insert stalls. For each such hazard, list the writing instruction, the reading instruction, the register involved, and which interstage buffer the forwarded value will be taken from. Do not list data dependencies unless they would require forwarding of an operand. Write your answers in the following form (answer below is NOT correct for this question): writer reader register forward from ------------------------------------------4.1 4.2 $t5 MEM/WB b) [8 points] Identify the data hazards that would prevent the given sequence of instructions from executing correctly on the given hardware design above, even though we do have the necessary hardware to carry out forwarding. For each such hazard, list the writing instruction, the reading instruction, and the register involved. Write your answers in the following form (answer below is NOT correct for this question): writer reader register ---------------------------4.1 4.2 $t5 2. [10 points] Why doesn't the Forwarding unit need the output from MUX labelled 2 in the diagram? Be precise. 3. [12 points] The Forwarding unit does receive the write-to register number taken from the MEM/WB interstage buffer (labelled 3 in the diagram). This is the write-to register for the instruction that has just entered the WB stage. Since that instruction will write a value into the appropriate register while that instruction is in the WB stage, why does the Forwarding unit need to see its write-to register number? 2 CS 2506 Computer Organization II MIPS 3: Forwarding and Hazard Detection For questions 4 and 5, refer to the pipeline design with forwarding and (load-use) hazard detection, shown below, which supports execution any sequence of the following MIPS instructions: add, sub, and, or, slt, lw, and sw. 4. [15 points] How many stalls would the load-use Hazard Detection unit trigger if we executed each of the following sequences of instructions? (The parts are independent.) a) lw add add $t1, ($t0) $t3, $t2, $t1 $t4, $t3, $t1 b) lw lw add add $t1, $t2, $t3, $t4, ($t0) ($t0) $t2, $t1 $t3, $t1 lw lw add add $t1, $t2, $t3, $t4, ($t0) ($t1) $t2, $t1 $t3, $t1 c) 3 CS 2506 Computer Organization II 5. MIPS 3: Forwarding and Hazard Detection [30 points] Suppose we executed the following sequence of instructions, and suppose that the registers have the indicated initial values: a) lw $t1, ($t0) add sub $t3, $t2, $t1 $t4, $t4, $t3 # # # # $t0 initially 0x08004000, $t1 initially 1, Mem[0x08004000] initially 10 $t2 initially 2, $t3 initially 3 $t4 initially 4 Suppose that we executed the instructions above on the pipeline with forwarding and load-use hazard detection. What would be the final values in the registers $t1, $t3 and $t4? b) Suppose that we executed the instructions above and, when a load-use hazard occurred, we did not prevent updates to the PC register, but we did prevent updates to the IF/ID buffer. What would be the final values in the registers $t1, $t3 and $t4? c) 6. Suppose that we executed the instructions above and, when a load-use hazard occurred, we did not prevent updates to the IF/ID buffer, but we did prevent updates to the PC register. What would be the final values in the registers $t1, $t3 and $t4? MAD Corporation currently produces three different processors, all executing the same machine language: a) P1 has a 2.4 GHz clock rate and an advertised CPI of 1.2 P2 has a 3.2 GHz clock rate and an advertised CPI of 1.5 P3 has a 3.8 GHz clock rate and an advertised CPI of 1.8 [5 points] Using IPS (instructions per second) as your criterion, and accepting the information given above, which processor offers the best performance? Justify your conclusion precisely. b) [6 points] It takes 12 seconds (of CPU time) to execute a certain benchmark on P2. How many machine instructions are executed when that benchmark is run on P2? Justify your conclusion precisely. c) [6 points] MAD would like to reduce the execution time of that benchmark on P2 by 25%, but the redesign they've come up with would entail increasing the CPI by 15%. What clock rate must they apply in order to achieve their goal? State the clock rate to the nearest hundredth of a GHz. Justify your conclusion precisely. 4