CS 152 Computer Architecture and Engineering Spring 1996 ...
CS 152 Computer Architecture and Engineering Lecture 9 Designing a Multicycle Processor February 26, 2003 John Kubiatowicz (www.cs.berkeley.edu/~kubitron) lecture slides: http://inst.eecs.berkeley.edu/~cs152/ 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Recap: Processor Design is a Process Bottom-up assemble components in target technology to establish critical timing Top-down specify component behavior from high-level requirements Iterative refinement establish partial solution, expand and improve Instruction Set Architecture processor datapath Reg. File Mux ALU control Reg Cells 2/26/03 Mem Decoder Sequencer Gates
UCB Spring 2003 CS152 / Kubiatowicz Recap: A Single Cycle Datapath Instruction<31:0> 1 Mux 0 RegWr 5 5 Rs 5 Rt Rs ALUctr busA 0 1 32 Imm16 MemtoReg Data In 32 ALUSrc 0 32 Clk WrEn Adr 32 Mux 16 Extender imm16
32 Mux 32 Clk Rw Ra Rb 32 32-bit Registers busB 32 Rd Equal MemWr ALU busW Rt <0:15> Clk <11:15> RegDst Rt <21:25> Rd Instruction Fetch Unit <16:20> nPC_sel 1 Data Memory ExtOp 2/26/03 UCB Spring 2003
CS152 / Kubiatowicz Recap: The Truth Table for the Main Control RegDst op 6 Main Control func ALUSrc 6 : ALUop ALU Control (Local) ALUctr 3 3 op RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop (Symbolic) ALUop <2> ALUop <1> ALUop <0> 2/26/03 00 0000 R-type 1 0 0 1 0 0 0 x
R-type 1 0 0 00 1101 10 0011 10 1011 00 0100 00 0010 ori lw sw beq jump 0 0 x x x 1 1 1 0 x 0 1 x x x 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 x x Or Add Add Subtract xxx
0 0 0 x 0 1 0 0 x 0 0 0 0 x 1 UCB Spring 2003 CS152 / Kubiatowicz Recap: PLA Implementation of the Main Control op<5> . op<5>. op<5>. op<5>. op<5>. op<5>. . . <0> R-type . <0> ori . <0> lw . <0> sw <0> beq
jump . op<0> RegWrite ALUSrc RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<1> ALUop<0> 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Recap: Systematic Generation of Control Control Logic / Store (PLA, ROM) microinstruction Conditions Instruction Decode OPcode Control Points Datapath In our single-cycle processor, each instruction is realized by exactly one control command or microinstruction in general, the controller is a finite state machine microinstruction can also control sequencing (see later) 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz
The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Input Control Memory Datapath Output Todays Topic: Designing the Datapath for the Multiple Clock Cycle Datapath 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Abstract View of our single cycle processor Main Control op Result Store MemWr RegDst RegWr Reg. Wrt Data Mem Mem Access Ext ExtOp ALUSrc ALUctr Equal Register Fetch Instruction Fetch
PC nPC_sel Next PC ALU MemRd MemWr ALU control fun looks like a FSM with PC as state 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Whats wrong with our CPI=1 processor? Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup Load PC Inst Memory ALU Data Mem Store PC mux Reg File
Critical Path Inst Memory Reg File ALU Data Mem Branch PC Inst Memory Reg File mux cmp mux setup mux Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Memory Access Time Physics => fast memories are small (large memories are slow) Storage Array selected word line storage cell address bit line address decoder sense amps Processor Cache proc. bus
=> Use a hierarchy of memories L2 Cache 1 time-period 2-3 time-periods 2/26/03 UCB Spring 2003 mem. bus question: register file vs. memory memory 20 - 50 time-periods CS152 / Kubiatowicz Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one May be able to short-circuit path and remove some components for some instructions! storage element storage element Acyclic Combinational Logic Acyclic Combinational Logic (A) storage element Acyclic Combinational Logic (B) storage element 2/26/03 UCB Spring 2003 storage element CS152 / Kubiatowicz Worst Case Timing (Load)
Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access Time New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value ExtOp Old Value New Value ALUSrc Old Value New Value MemtoReg Old Value New Value RegWr Old Value New Value busA busB Delay through Control Logic New Value Register Write Occurs Register File Access Time
New Value Old Value Delay through Extender & Mux Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time busW 2/26/03 Old Value UCB Spring 2003 New CS152 / Kubiatowicz Basic Limits on Cycle Time Next address logic PC <= branch ? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation UCB Spring 2003 Result Store MemWr RegDst RegWr Reg. File Data
Mem MemRd MemWr ALUctr Exec Mem Access Operand Fetch Instruction Fetch PC ExtOp nPC_sel Next PC 2/26/03 ALUSrc Control R <= A + B CS152 / Kubiatowicz 2/26/03 UCB Spring 2003 Place enables on all registers Result Store MemWr MemRd MemWr RegDst RegWr Reg. File Data
Mem Exec Mem Access ALUctr ALUSrc ExtOp Operand Fetch Instruction Fetch PC Next PC nPC_sel Equal Partitioning the CPI=1 Datapath Add registers between smallest steps CS152 / Kubiatowicz 2/26/03 ExtOp Equal B UCB Spring 2003 S Reg. File RegDst RegWr MemToReg MemRd MemWr
ALUctr Ext ALUSrc ALU A Result Store Reg File Mem Access IR nPC_sel E Data Mem Operand Fetch Instruction Fetch PC Next PC Example Multicycle Datapath M Critical Path ? CS152 / Kubiatowicz Administrative Issues Read Chapter 5 This lecture and next one slightly different from the book Midterm two weeks from today (Wednesday 3/12): 5:30pm to 8:30pm, location TBA No class on that day: Pencil, calculator, one 8.5 x 11 (both sides) of handwritten notes Sit in every other chair, every other row
Meet at LaVals pizza after the midterm 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Recall: Step-by-step Processor Design Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Step 4: R-rtype (add, sub, . . .) Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ADDU R[rd] < R[rs] + R[rt]; PC < PC + 4 inst Physical Register Transfers Time IR < MEM[pc] ADDU A< R[rs]; B < R[rt] S < A + B R[rd] < S; PC < PC + 4 B UCB Spring 2003
Reg. File S Mem Access Exec Reg File IR Inst. Mem A M Data Mem 2/26/03 PC Next PC E CS152 / Kubiatowicz Step 4: Logical immed Logical Register Transfer inst Logical Register Transfers ORI R[rt] < R[rs] OR ZExt(Im16); PC < PC + 4 Physical Register Transfers inst Physical Register Transfers Time
IR < MEM[pc] ORI A< R[rs]; B < R[rt] S < A or ZExt(Im16) R[rt] < S; PC < PC + 4 B UCB Spring 2003 Reg. File S Mem Access Exec Reg File IR Inst. Mem A M Data Mem 2/26/03 PC Next PC E CS152 / Kubiatowicz Step 4 : Load Logical Register Transfer Physical Register Transfers inst Logical Register Transfers
LW R[rt] < MEM[R[rs] + SExt(Im16)]; PC < PC + 4 inst Physical Register Transfers Time IR < MEM[pc] LW A< R[rs]; B < R[rt] S < A + SExt(Im16) M < MEM[S] R[rd] < M; PC < PC + 4 B UCB Spring 2003 Reg. File S Mem Access Exec Reg File IR Inst. Mem A M Data Mem 2/26/03 PC Next PC
E CS152 / Kubiatowicz Step 4 : Store Logical Register Transfer inst Logical Register Transfers SW MEM[R[rs] + SExt(Im16)] < R[rt]; PC < PC + 4 Physical Register Transfers Time inst Physical Register Transfers IR < MEM[pc] SW A< R[rs]; B < R[rt] S < A + SExt(Im16); MEM[S] < B PC < PC + 4 B UCB Spring 2003 Reg. File S Mem Access Exec Reg File IR Inst. Mem
A M Data Mem 2/26/03 PC Next PC E CS152 / Kubiatowicz Step 4 : Branch Logical Register Transfer inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + 4+SExt(Im16) || 00 Physical Register Transfers else PC <= PC + 4 Time inst Physical Register Transfers IR < MEM[pc] BEQ E< (R[rs] = R[rt]) if !E then PC < PC + 4 else PC
Exec Reg File IR Inst. Mem A M Data Mem 2/26/03 PC Next PC E CS152 / Kubiatowicz Alternative datapath (book): Multiple Cycle Datapath Miminizes Hardware: 1 memory, 1 adder PCWr PCSrc RegDst ALUSelA RegWr 1 32 PC 1 WrAdr 32 Din Dout 32 32 32
ALU Control 32 MemtoReg UCB Spring 2003 Zero ALU Out 32 Rt 5 Target ALU Mux RAdr Rs 32 0 0 Mux 0 32 Instruction Reg 32 BrWr Mux PCWrCond Zero IorD MemWr
IRWr ALUOp ALUSelB CS152 / Kubiatowicz Our Control Model State specifies control points for Register Transfer Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic State X Register Transfer Control Points Control State Depends on Input Output Logic outputs (control points) 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Step 4 Control Specification for multicycle proc IR <= MEM[PC] S <= A fun B ORi S <= A or ZX LW S <= A + SX M <= MEM[S] SW S <= A + SX MEM[S] <= B PC <= PC + 4 R[rd] <= S R[rt] <= S R[rt] <= M
PC <= PC + 4 PC <= PC + 4 PC <= PC + 4 2/26/03 BEQ UCB Spring 2003 PC <= Next(PC,Equal) Write-back Memory Execute decode / operand fetch A <= R[rs] B <= R[rt] R-type instruction fetch CS152 / Kubiatowicz Traditional FSM Controller next state op cond state control points Truth Table 11 Equal next State control points 6 4 State op 2/26/03 datapath State UCB Spring 2003 CS152 / Kubiatowicz Step 5 (datapath + state diagram control) Translate RTs into control points Assign states
Then go build the controller 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Mapping RTs to Control Points IR <= MEM[PC] imem_rd, IRen A <= R[rs] B <= R[rt] S <= A fun B ORi ALUfun, Sen S <= A or ZX LW S <= A + SX M <= MEM[S] R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen 2/26/03 SW BEQ S <= A + SX MEM[S] <= B PC <= PC + 4 R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 UCB Spring 2003 PC <= Next(PC,Equal) Write-back Memory Execute
decode Aen, Ben, Een R-type instruction fetch CS152 / Kubiatowicz Assigning States instruction fetch IR <= MEM[PC] 0000 decode A <= R[rs] B <= R[rt] R-type S <= A fun B 0100 ORi S <= A or ZX 0110 LW S <= A + SX 1000 M <= MEM[S] 1001 SW BEQ S <= A + SX 1011 MEM[S] <= B PC <= PC + 4 1100
R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 PC <= PC + 4 0101 2/26/03 0111 1010 UCB Spring 2003 PC <= Next(PC) 0011 Write-back Memory Execute 0001 CS152 / Kubiatowicz (Mostly) Detailed Control Specification (missing0) State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B E Ex Sr ALU S R W M M-R Wr Dst BEQ: R: ORi: LW: SW: 0000 0001 0001 0001 0001 0001 ?????? BEQ R-type ORI LW SW ?
x x x x x 0001 1 0011 0100 0110 1000 1011 0011 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx 0 1 x x x x x x x x x 0000 0000 0101 0000 0111
0000 1001 1010 0000 1100 0000 2/26/03 111 111 111 111 111 1 1 -all same in Moore machine 0 1 x x 0 0 x x 0 1 1 0 1 0 1 1 0 0 1 fun 1 1 0 0 0 or 1 1 0 1 0 add 1 1 0 1
1 0 1 0 add 1 1 0 UCB Spring 2003 0 1 0 CS152 / Kubiatowicz Performance Evaluation What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3
20% 0.6 Average CPI:4.1 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Controller TheDesign state digrams that arise define the controller for an instruction set processor are highly structured Use this structure to construct a simple microsequencer Control reduces to programming this very simple device microprogramming sequencer control datapath control microinstruction micro-PC 2/26/03 sequencer UCB Spring 2003 CS152 / Kubiatowicz Example: Jump-Counter 0000 i i i+1 Map ROM None of above: Do nothing (for wait states) op-code Counter
2/26/03 zero inc load UCB Spring 2003 CS152 / Kubiatowicz Using a Jump Counter instruction fetch IR <= MEM[PC] 0000 inc decode A <= R[rs] B <= R[rt] R-type S <= A fun B 0100 inc LW ORi S <= A or ZX 0110 S <= A + SX 1000 inc inc M <= MEM[S] 1001 inc SW BEQ S <= A + SX
1011 inc MEM[S] <= B PC <= PC + 4 1100 R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 PC <= PC + 4 0101 zero 2/26/03 0111 zero 1010 zero UCB Spring 2003 zero PC <= Next(PC) 0011 zero Write-back Memory Execute 0001 load CS152 / Kubiatowicz Our Microsequencer taken ZIL datapath control Micro-PC op-code Map ROM
2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Microprogram Control Specification PC Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 0001 ? 0 inc 1 load 0011 0011 R: 0100 0101 ORi: 0110 0111 LW: 1000 1001 1010 SW: 1011 1100 0 1 x x x x x x x x x zero
zero inc zero inc zero inc inc zero inc zero BEQ 2/26/03 11 1 1 0 1 0 1 fun 1 1 0 0 1 1 0 0 or 1 1 0 0 1 0 1 0 add 1 1 0 1 1 0 1 1 0 1 0 add 1 1 0 UCB Spring 2003 0 1 0
CS152 / Kubiatowicz Adding the Dispatch ROM Sequencer-based control unit from last lecture Called microPC or PC vs. state register Control Value Effect 00 Next address = 0 01 Next address = dispatch ROM 10 Next address = address + 1 1 ROM: Adder R-type BEQ ori LW SW 2/26/03 000000 000100 001101 100011 101011 0100 0011 0110 1000 1011 UCB Spring 2003 Address Select Logic microPC Mux 2 1 0 0 ROM Opcode CS152 / Kubiatowicz Example: Controlling Memory
PC addr Instruction Memory InstMem_rd IM_wait data Inst. Reg 2/26/03 IR_en UCB Spring 2003 CS152 / Kubiatowicz Controller handles non-ideal memory instruction fetch IR <= MEM[PC] wait ~wait R-type S <= A fun B LW ORi S <= A or ZX SW S <= A + SX M <= MEM[S] ~wait wait R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 PC <= PC + 4
2/26/03 BEQ S <= A + SX PC <= Next(PC) MEM[S] <= B ~wait PC <= PC + 4 UCB Spring 2003 wait Write-back Memory Execute decode / operand fetch A <= R[rs] B <= R[rt] CS152 / Kubiatowicz IR <= MEM[PC] write-back 2/26/03 wait ~wait A <= R[rs] B <= R[rt] R-type S <= A fun B ORi S <= A or ZX Memory Execute decode instruction fetch Really Simple Time-State Control
LW SW S <= A + SX M <= MEM[S] wait R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 UCB Spring 2003 BEQ S <= A + SX MEM[S] <= B wait PC <= PC + 4 PC <= Next(PC) CS152 / Kubiatowicz Time-state Control Path Local decode and control at each stage 2/26/03 UCB Spring 2003 Reg. File M Data Mem B Mem Access
PC Next PC Equal IRmem WB Ctrl Ex Ctrl Exec S IRwb IRex A Mem Ctrl Dcd Ctrl Reg File IR Inst. Mem Valid CS152 / Kubiatowicz Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Sequencing Control Logic Representation Implementation Technique 2/26/03 Finite State Diagram
Microprogram Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Equations Truth Tables PLA ROM hardwired control UCB Spring 2003 microprogrammed control CS152 / Kubiatowicz Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~10 levels of logic between latches Follow same 5-step method for designing real processor 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Summary (contd) Control is specified by finite state digram Specialize state-diagrams easily captured by microsequencer simple increment & branch fields datapath control fields Control design reduces to Microprogramming Control is more complicated with: complex instruction sets restricted datapaths (see the book)
Simple Instruction set and powerful datapath simple control could try to reduce hardware (see the book) rather go for speed => many instructions at once! 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz Where to get more information? Next two lectures: Multiple Cycle Controller: Appendix C of your text book. Microprogramming: Section 5.5 of your text book. D. Patterson, Microprograming, Scientific American, March 1983. D. Patterson and D. Ditzel, The Case for the Reduced Instruction Set Computer, Computer Architecture News 8, 6 (October 15, 1980) 2/26/03 UCB Spring 2003 CS152 / Kubiatowicz
The limbic/emotional brain generally refers to amygdala, hippocampal formation, hypothalamus, thalamus, and nearby "paralimbic" cortex, such as the anterior cingulate cortex, orbitofrontal cortex insula, and temporal poles Given the role of aberrant, intrusive, emotional memory in PTSD symptomatology, the limbic...
Iron is able to 'steal' the sulphate from copper because it is more reactive You are going to carry out some displacement reactions to decide which metal is more reactive. Record your observations and fill in the answers below.
Advanced Modular Incoherent Scatter Radar (AMISR) Student Workshop. July 26 - 30, 2010. MIT Haystack Observatory. Westford, MA. Evan Thomas. The Lecturers. MIT Haystack - Anthea Coster, Asti Bhatt, Bill Rideout, Phil Erickson, and Alan Rogers ... Advanced Modular Incoherent...
The ss-FW had much higher water content, volatile solids, and lower total solid basis. It also contain higher % of carbohydrates, lipids, and proteins on a volatile solids because of the higher calorific value. The ss-FW indicated longer biogas yield...
La investigación actual en ciencias del deporte: una nueva perspectiva. Movimiento Olímpico. Historia, Candidaturas y Sedes Olímpicas. Historia de un no-evento: los JJ.OO. De Tokio 1940 / Sandra Collins (University of Chicago, USA) - 1999 PhD Students Research Programme Grant...
The first place the complete square form is mentioned in the syllabus is LCHL. However, this question was asked in JCHL 2015. Students are expected to have this knowledge for JC transformations. Therefore, when do we teach the complete square...
There were 6 vases of flowers on the table. Each vase held 3 flowers. Talia took 2 flowers out to give to her mother. How many flowers were left? Mrs. Gunn made some cookies. She put 12 cookies on each...
A fixed portion of memory is used for mapping regardless of the number of processes or virtual pages This approach is used by IBM's AS/400 and RISC System/6000 computers Translation Lookaside Buffer Translation Lookaside Buffer (Cont.) Translation lookaside buffer (TLB)...
Ready to download the document? Go ahead and hit continue!