``` 8: 1000 ויטו X30; 5000 0000 0000 0000 Our overflow! 1200 x30; B 600 cono oom out 11001 10 11 ,5) เอดย 70101 x 30: D 000 0000 0000 0000 Gvir flaw na primiles add Nada na regurdo x 7 = 0x 000 000 0 AAAA AAAA 0 X7 = 0x 1234567 ABABEFEF & (2) x 7= 9x 13466 78 123456780 (3) x 7 = 0x 000000000 15555555 x7= 0x000000000000000545 ``` ``` SRL; x5, x5,11 SLLi x5, x5, 26 0,70 xx + 0,10 x6 + 82x3 = 2,6 x1.25 AND YG, x6, FPFFFFFF 03 FF FFFF h x= 2,9 ≈3 6R ×5, ×5, ×6 (03) x = 3, 8 ≈ 4 0 1000 110 01000111 XOA] x5, x6,-1 185 = 10111001 12= 01111010 x6=2// 1911 1001 ~ Maylow 2.24 122 -0111 1010 122= 1000 0110 + 1800 PIID (1) XS =2// 0011 111 <u>3</u>4N+1 L]x7,0 FOR1: BGE x7, x5, EVD ADDI x7, x7, 1 LI x29,0 FOR2: BGE x29, x6, FOR 1 ADD x28, x7, x29 SLLZ x27, x29, 2 ADD x27, x10, x17 LW x28,0(27) ADDI x 29, x29, 1 J FOR2: END: Vaci 11 (7+1/KID + (5x4)+2=109 6 212,8 LOOP: A DDI ×17, ×11,1 6R ×10, ×11, ×10 SLLI x 10, x 10, 4 6R ×10, ×11, ×10 SLLI x 10, x 10, 4 BG B x11, x12, END J Loop ``` 0,70×2+0,10×6+0,1×3=3,6CPI 10111001 F0111 101 0 00110011 & valor carredo **4.12** Examine the difficulty of adding a proposed swap rs1. rs2 instruction to RISC-V. Interpretation: Reg[rs2]=Reg[rs1]; Reg[rs1]=Reg[rs2] 4.12.1 [5] <\$4.4> Which new functional blocks (if any) do we need for this 4.12.2 [10] <\$4.4> Which existing functional blocks (if any) require 4.12.3 [5] <\$4.4> What new data paths do we need (if any) to support this 4.12.4 [5] <\$4.4> What new signals do we need (if any) from the control unit to 4.12.5 [5] <§4.4> Modify Figure 4.21 to demonstrate an implementation of this Y.12.1) x 4.12.2) Mador o bienco de regentos de farma a aceitar 2 Injuito 4.(2.3) ATRAMON forma de farsar un des res son jursor jula HU 4.(2.4) Um andro WE form isotudios jumpio **4.13** Examine the difficulty of adding a proposed SSrS1, rS2, imm (Store Sum) instruction to RISC-V. Interpretation: Mem[Reg[rs1]]=Reg[rs2]+immediate 4.13.1 [10] <\$4.4> Which new functional blocks (if any) do we need for this $\textbf{4.13.2} \hspace{0.2cm} \textbf{[10] < \$4.4> Which existing functional blocks (if any) require modification?} \\$ 4.13.3 [5] <\$4.4> What new data paths do we need (if any) to support this 4.13.4 [5] <\$4.4> What new signals do we need (if any) from the control unit to 4.13.1) 1 Mux 4.13.3) Lyan a raida da ALV ao Murde DATA MEM & Uzan Um mux à actions entry da data hem e light xt a ene mux 4.13.4) Um carrolo para e maro Mux **4.17** [10] <\$4.5> What is the minimum number of cycles needed to completely execute n instructions on a CPU with a k stage pipeline? Justify your formula. K+m-1 addi x11, x12, 5 add x13, x11, x12 addi x14, x11, 15 x11 =11 x13 = 33 x12=22 x14=26 4.19 [10] <\$4.5> Assume that x11 is initialized to 11 and x12 is initialized to 22. Suppose you executed the code below on a version of the pipeline from Section 4.5 that does not handle data hazards (i.e., the programmer is responsible for addressing data hazards by inserting NOP instructions where necessary). What would the final values of register x15 be 4 sasume the register file is written at the beginning of the cycle and read at the end of a cycle. Therefore, an ID stage will return the results of a WB state occurring during the same cycle. See Section 4.7 and Figure 4.5 If or details. and Figure 4.51 for details. addi x11, x12, 5 add x13, x11, x12 addi x14, x11, 15 add x15, x11, x11 x15 = 22+5+22+5 = 54 **4.20** [5] <§4.5> Add NOP instructions to the code below so that it will run correctly on a pipeline that does not handle data hazards. addi x11, x12, 5 add x13, x11, x12 addi x14, x11, 15 add x15, x13, x12 addi x11, x12, 5 3x NOP (-1 se o registo for transparente ou a escrita for desfasada) add x13, x11, x12 addi x14, x11, 15 2x NOP (-1 se o registo for transparente ou a escrita for desfasada) add x15, x13, x12 correctly handle data hazards. 4.22.1. [3] (4)-55. Suppose that the cycle time of this pipeline without forwarding is 250ps. Suppose also that adding forwarding handware will reduce the number of 10% from ... 1970. ... 05% by, the increase the cycle time to 300ps. What is the speedup of this new pipeline compared to the one without forwarding? 4.22.1. [10] (4-55. Different programs will require different amounts of 1078. How many 1079 (as a precruitage of code instructions) can remain in the typical program before that program mass for the forwarding? 4.21.4 [10] <64.5> Can a program with only .075\*n NOPs possibly run faste on the pipeline with forecarding? Explain why or why not. 4.21.5 [10] <94.5> At minimum, how many NOPs (as a percentage of code instructions) must a program have before it can possibly run faster on the pipeline 4,21,1) (1.4.m.250) =1,1(1) 4,21,3) 300(1+4/n<250(17)m Y<(2502 50)/300 4,21,2) 14.00.250 < 1 => oc < 1, 67/ 300.01 < 250.1, 075'gg vai shyre consur melhon meren powerday 4,21,5) 0( 26016-50 Cos 50 (x 600 02/22/) $\pmb{4.24} \ [10] < \$4.7> \ Which of the two pipeline diagrams below better describes the operation of the pipeline's hazard detection unit? Why?$ Choice 1: Indu 1d x11, 0(x12): IF ID EX ME WB add x13, x11, x14: IF ID EX..ME WB or x15, x16, x17: IF ID..EX ME WB 1d x11, 0(x12): IF ID EX ME WB add x13, x11, x14: IF ID.,EX ME WB or x15, x16, x17: IF..ID EX ME WB 2// Problems in this exercise refer to the following sequen ne that it is executed on a five-stage pipelined datapath: add x15, x12, x11 1d x13, 4(x15) 1d x12, 0(x2) or x13, x15, x13 sd x13, 0(x15) 4.27.1 [5] <\$4.7> If there is no forwarding or hazard detection, insert NOPs to **4.27.2** [10] <\$4.7> Now, change and/or rearrange the code to minimize the number of NOPs needed. You can assume register x17 can be used to hold temporary values in your modified code. **4.27.3** [10] <\$4.7> If the processor has forwarding, but we forgot to implement the hazard detection unit, what happens when the original code executes? 4.27.4 [20] <\$4.7> If there is forwarding, for the first seven cycles during the execution of this code, specify which signals are asserted in each cycle by hazard detection and forwarding units in Figure 4.59. 4.27,1) 4,27.1) add x15, x12, x11 add x15, x12, x11 3x NOP 3x NOP 3x NUF ld x13, 4(x15) dá Mado// ld x13, 4(x15) ld x12, 0(x2) 2x NOP 2x NOP or x13, x15, x13 or x13, x15, x13 3x NOP 3x NOP sd x13, 0(x15) sd x13, 0(x15) 4.27.3) Core trm, Nother fallonday 4.27.4) MEM->Ex WB -DFX MEM-2FY **5.2** Caches are important to providing a high-performance memory hierarchy to processors. Below is a list of 64-bit memory address references, given as word addresses. 0x03, 0xb4, 0x2b, 0x02, 0xbf, 0x58, 0xbe, 0x0e, 0xb5, 0x2c, 0xba, 0xfd **5.2.1** [10] < \$5.3> For each of these references, identify the binary word address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list whether each reference is a hit or a miss, assuming the cache is initially empty. **5.2.2** [10] <§5.3> For each of these references, identify the binary word address, the tag, the index, and the offset given a direct-mapped cache with two-word blocks and a total size of eight blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty. **5.2.3** [20] < §§5.3, 5.4> You are asked to optimize a cache design for the given references. There are three direct-mapped cache designs possible, all with a total of eight words of data: ■ C1 has 1-word blocks, ■ C2 has 2-word blocks, and ■ C3 has 4-word blocks. 5.2.1) offent = log (1)=0 Indu: logo (16) = 4 Roy = 1200 1000 0011 Voi dur rempre MISS 5.2.2) april = log, (2)=1 \_ Index = laga(8) = 3 Tag = 5 | | TA 6 | Index | affeset | MW | |-----------------------|--------|-------|---------|----| | <b>0</b> 9 <b>0</b> 3 | | 901 | | M | | ox by | 1011 | 010 | 0 | M | | 0x2/r | ما ہ و | 101 | ] | M | | 0xo2 | 0000 | 109 | 0 | Н | | 1 . | 1 | | | | 5.5.4) TAZI Oh Q/h oh 9h ٥h oh 11100 10100 ∞h Oyh Iah 84 h ESh Aoh 5,2.3) Caladar co minoralo de cadalin 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 Kill cache can hold 4 Kill of data); however, caches also require SRAM to store metadata such as tags and valid bits. For this exercise, you will examine how a cache's configuration affects the total amount of SRAM needed to implement it as well as the performance of the cache. For all parts, assume that the caches are byte addressable, and than addresses and words are 46 kills. 5.3.1 [10] <\$5.3> Calculate the total number of bits required to implen KiB cache with two-word blocks. 5.3.2 [10] <55.3> Calculate the total number of bits required to implement a 64 KiB cache with 16-word blocks. How much bigger is this cache than the 32 KiB cache described in Exercise 5.3.1 (Notice that, by changing the block size, we doubled the amount of data without doubling the total size of the cache.) $af_{x} = \log_{2}(8) = 3$ $ind x = \log_{2}\left(\frac{32k}{9}\right) = \log_{2}\left(\frac{2}{3}\right) = 11$ [1g=64-1-11-3=4 9 5.3.2) affin Alora = lay)(16) = 4 aph = Lag(8) =3 The= 64-4-3-4= 48 5.3.3) Im moras liche, juli lien amin mines Tog Index Offset 63-10 9-5 4-0 **5.5.1** [5] <\$5.3> What is the cache block size (in words)? 5.5.2 [5] <65.3> How many blocks does the cache have? 5.5.3 [5] <65.3> What is the ratio between total bits required for such a cache implementation over the data storage bits? eginning from power on, the following byte-addressed cache references are Hex 00 04 10 84 E8 A0 400 1E 8C C1C 84 884 Dec 0 4 16 132 232 180 1028 30 140 3100 180 2180 5.5.4 [20] <\$5.3> For each reference, list (1) its tag, index, and offset, (2) whethe it is a hit or a miss, and (3) which bytes were replaced (if any). **5.5.5** [5] <§5.3> What is the hit ratio? 5.5.6 [5] <\$5.3> List the final state of the cache, with each valid ent as a record of <index, tag, data>. For example, <o. 3. Mem[0xC00]-Mem[0xC1F]> 5.5.1) 35/H/0 = 25 = 4 words deb bytes 2° bito = 32 blocas 5,5.3) 32 x 4 x 8 = 25 x 22 x 3 = 210 byte 52 13 logs = 8192 5.5.6) < 0,3, Cooh - C1Fh> < 4, 2, 880h - 69 fh> < 5, 0,000 - 00F> < 7, 0,000 - 00FF> 213+ (54+1) x32 = 9952 5.5.5) HR = 33%. 9952 = 1.21