

# Module 5 Central Processing Unit (CPU)

#### **Objectives**:

- To learn the components common to every CPU.
- Be able to explain how each component in CPU contributes to instruction cycle.
- To understand the concept of pipelining in CPU execution.
- Be able to understand micro-operations basis for the design and implementation of the control unit.

# Module 5 Central Processing Unit (CPU)

- 5.1 Processor Organization
- 5.2 Register Organization
- 5.3 Instruction Cycles
- 5.4 Instruction Pipelining
- 5.5 Control Unit Operation
- 5.6 Microprogrammed Control
- 5.7 Summary

# Module 5b Central Processing Unit (CPU)

- 5.1 Processor Organization
- 5.2 Register Organization
- 5.3 Instruction Cycles

#### 5.4 Instruction Pipelining

- 5.5 Control Unit Operation
- 5.6 Microprogrammed Control

- Overview
- Pipeline Strategy
- □ Pipeline Performance
- □ Pipeline Hazards (Limitations)

5.7 Summary



- Organizational enhancements to the processor can improve performance another approach → instruction pipelining.\*
- CPU break fetch-decode-execute cycle into tasks that performed in parallel so that more than one tasks can be executed at a time.

The concept of pipelining is to allow the processing of a new task even though the processing of previous task has not ended.

#### **Analogy:** Pipeline Laundry



Anyone who has done a lot of laundry has intuitively used pipelining.

- The non-pipelined approach to laundry would be as follows:
  - 1. Place one dirty load of clothes in the washer.
  - 2. When the washer is finished, place the wet load in the dryer.
  - 3. When the dryer is finished, place the dry load on a table and fold.
  - 4. When folding is finished, ask your roommate to put the clothes away.



Figure: The laundry analogy (4 loads: A, B, C, D) for pipelining.



Figure: The laundry analogy (816 loads) for pipelining.

#### **Instruction Pipeline**





- A single instruction cycle process is divided into several small independent tasks (or segments).
- Some segments can be performed in <u>parallel</u>.



With the use of pipeline, the average CPU is reduced.

- 1) Data operands pass through all segments (S) in sequence.
- 2) Each segment consists of a circuit that performs sub-operation.
- 3) Segments are separated by registers (R), that hold the intermediate results between stages.



Figure: CPU pipeline structure.





### Pipeline Strategy

- The process is referred to as pipelining when new inputs are accepted at one end before previously accepted inputs appear as outputs at the other end.
- As a simple approach, consider subdividing instruction processing into two stages:
  - fetch instruction and
  - execute instruction.

Instruction prefetch or fetch overlap;

This process will speed up instruction execution.

- There are times during the <u>execution</u> of an instruction when main memory is not being accessed.
- This time could be used to <u>fetch</u> the next instruction in <u>parallel</u> with the execution of the current one.



Figure: Two-stage instruction pipeline: simplified view

Execution will involve reading and storing operands and the performance of some operation. Thus, the fetch stage may have to wait for some time before it can empty its buffer

- This doubling of execution rate is unlikely for two reasons:
  - 1) The <u>execution time</u> will generally be <u>longer</u> than the <u>fetch</u> <u>time</u>.
  - 2) A conditional branch instruction makes the address of the next instruction to be fetched unknown.



Figure: Two-stage instruction pipeline: expanded view

A conditional branch instruction makes the address of the next instruction to be fetched unknown. Thus, the fetch stage must wait until it receives the next instruction address from the execute stage. The execute stage may then have to wait while the next instruction is fetched

- This doubling of execution rate is unlikely for two reasons:
  - 1) The <u>execution time</u> will generally be <u>longer</u> than the <u>fetch</u> <u>time</u>.
  - 2) A conditional branch instruction makes the address of the next instruction to be fetched unknown.



Figure: Two-stage instruction pipeline: expanded view

- To gain further speedup, the pipeline must have more stages.
- Let consider the following <u>decomposition</u> of the instruction processing:
  - □ *Fetch Instruction* (FI): Read next expected instruction into buffer.
  - □ *Decode Instruction* (DI): Determine the opcode and the operand specifiers.
  - □ *Calculate Operands* (CO): Calculate the effective address of each source operand.
  - □ Fetch Operands (FO): Fetch each operand from memory.
  - *Execute Instruction* (EI): Perform the indicated operation and store the result.
  - □ Write Operand (WO): Store the result in memory.



#### Example 10: Case 2

To implement the instruction cycle with multiple stages in simpler form:

 Use multiple execution "functional units" to parallelize the actual execution phase of several instructions.



Figure shows that a 6 stages pipeline can reduce the execution time for 9 instructions from 54 time units to 14 time units.

#### **Example 11:**



Figure: Timing Diagram for Instruction Pipeline Operation

## Pipeline Performance

- Here some simple measures of pipeline performance and relative speedup, S.
- Let the execution time, *T*, required for a non-pipeline and pipeline respectively, and pipeline with *k* stages to execute *n* instructions as:

#### Non-pipeline:

 $t_n \rightarrow$  the cycle time

$$T_n = t_n \times n$$

#### **Pipeline:**

 $t_p \rightarrow$  the cycle time

$$T_p = t_p(k + (n-1))$$

Note: k = 1

■ The speedup, *S*, for the instruction <u>pipeline</u> compared to execution <u>without the pipeline</u> is defined as:

$$S = \frac{T_n}{T_p} = \frac{t_n \times n}{t_p (k + (n-1))}$$

■ If n is too large from k, n >> k, and when  $t_n = k \times t_p$ , thus maximum speedup:

$$S_{\text{max}} = k$$



#### Example 12: Pipeline (Non-ideal case)

Given the 4 segment pipeline whereby each has a delay time as follow:

Segment 1 : 40*ns*

∘ Segment 2 : 25*ns* 

∘ Segment 3 : 45*ns* 

Segment 4: 45*ns*

The delay time for the interface register is 5 ns. Calculate the:

- a) cycle time of the non-pipeline and pipeline.
- b) execution time for 100 tasks.
- c) real speedup.
- d) Maximum speedup.

### 

#### **Solution:**





a) Cycle time of the non-pipeline and pipeline.

$$t_n =$$
(Total executions) + interface delay

$$t_n = (40 + 25 + 45 + 45) + 5 = 160 ns$$

 $t_p$  = (the longest delay for execution) + interface delay

$$t_p = (45 + 5) = 50ns$$

b) Execution time for 100 tasks.

#### Non-Pipeline:

$$T_n = t_n \times n$$
$$= 160 \times 100$$
$$= 16000 ns$$

#### Pipeline:

$$T_p = t_p(k + (n-1))$$
  
=  $50(4 + (100 - 1))$   
=  $50(103) = 5150ns$ 



c) Real speedup.

$$S = \frac{T_n}{T_p}$$

$$= \frac{t_n \times n}{t_p (k + (n - 1))}$$

$$= \frac{16000}{5150}$$

$$= 3.1068$$

d) Maximum speedup.

$$S_{\text{max}} = k = 4$$

#### **Activity 2**

#### Example 13:

- There are 4 segments as follows:
- The interface delay is 3 ns.
- The simple block diagram of the pipeline:



**Segment Name** 

**Segment** 

**Execution** 



Draw the space time diagram and calculate the:

- a) cycle time of the non-pipeline and pipeline.
- b) execution time for 50 tasks.
- c) real speedup.
- d) Maximum speedup.



#### **Solution**:

- Purpose of diagram to show the events of a pipeline
- Assume only 4 instructions.

|         |      | $t_1$ | $t_2$ | $t_3$ | $t_4$ | $t_5$ | $t_6$ | $t_7$ | <b>-</b> | time |
|---------|------|-------|-------|-------|-------|-------|-------|-------|----------|------|
|         | IF   |       |       |       |       |       |       |       |          |      |
| segment | OF   |       |       |       |       |       |       |       |          |      |
| = segr  | ΙE   |       |       |       |       |       |       |       |          |      |
| S       | , OS |       |       |       |       |       |       |       |          |      |

## Pipeline Hazards (Limitations)

- A pipeline hazard occurs when the pipeline, or some portion of the pipeline, must stall because conditions do not permit continued execution.
- Such a pipeline stall is also referred to as a pipeline bubble.



#### (a) Resource Hazards

5

(Resource Conflict)

- A resource hazard occurs when two (or more) instructions that are already in the pipeline need the same resource.
- The result is that the instructions must be executed in serial rather than parallel for a portion of the pipeline.

We have a conflict with 2 instructions need the same resource (i.e. memory)

→ data reads / writes can only happen 1 at a time.



(a) Five-stage pipeline, ideal case

Figure: Example of resource hazards

### Resource Hazards Solution:

Fetch Instruction (FI)
Decode Instruction (DI)
Calculate Operands (CO)
Fetch Operands (FO)
Execute Instruction (EI)
Write (store) Operand (WO)

Source operand for instruction I1 is in memory, rather than a register → idle for one cycle (wasted)



(a) Five-stage pipeline, ideal case



(b) I1 source operand in memory

Figure: Example of resource hazards

Instruction

Instruction



- Conflict of resources introduce 'pipeline delay'.
- A delay in any stage can cause pipeline stalls.



(a) Five-stage pipeline, ideal case



(b) I1 source operand in memory

Figure: Example of resource hazards

#### (b) Data Hazards

(Data Dependency)

- A data hazard occurs when there is a conflict in the access of an operand location.
- Two instructions in a program are to be executed in sequence and both access a particular memory or register operand.

#### **Example:**

ADD EAX, EBX /\* EAX=EAX+EBX SUB ECX, EAX /\* ECX=ECX-EAX

Clock cycle

| 1  | 2  | 3  | 4    | 5  | 6  | 7  | 8  | 9  | 10 |
|----|----|----|------|----|----|----|----|----|----|
| FI | DI | FO | EI   | wo |    |    |    |    |    |
|    | FI | DI | Idle |    | FO | EI | wo |    |    |
|    |    | FI |      |    | DI | FO | EI | wo |    |
|    |    |    |      |    | FI | DI | FO | EI | wo |

Figure: Example of data hazards

13

**I4** 

ADD EAX, EBX

SUB ECX, EAX

#### **Data Hazards**

#### **Example:**

ADD EAX, EBX /\* EAX=EAX+EBX SUB ECX, EAX /\* ECX=ECX-EAX

The ADD instruction does not update register EAX until the end of stage 5.



Figure: Example of data hazards



#### Data Hazards: Solutions

#### Hardware interlocks:

An interlock is a circuit that detects instructions with data dependency and inserts required delays to resolve conflicts.

#### Operand forwarding:

• Allowing the result of ALU to be used by other ALU operations in the next instruction cycle.

#### **Delayed load**:

- Compiler is used to detect conflict and reorder instructions to delay loading of conflicting data.
- Using NOP (no operation) instruction.

### I1 has finished WO (t5) then can I2 do EI (t6)

#### Example 14:

Solution of data hazard using Delayed Load → NOP

II: ADD EAX, EBX

**NOP** 

*I2*: SUB ECX, EAX

*I3*:

*I4*:

|    |           |     |            |            |           |            |            |           | <b>&gt;</b> time |
|----|-----------|-----|------------|------------|-----------|------------|------------|-----------|------------------|
|    | 1         | 2   | 3          | 4          | 5         | 6          | 7          | 8         | 9                |
| FI | <i>I1</i> | NOP | <i>I</i> 2 | <i>I3</i>  | <i>I4</i> |            |            |           |                  |
| DI |           | I1  | NOP        | <i>I</i> 2 | <i>I3</i> | <i>I4</i>  |            |           |                  |
| FO |           |     | <i>I1</i>  | NOP        | <i>I2</i> | <i>I3</i>  | <i>I4</i>  |           |                  |
| EI |           |     |            | I1         | NOP       | <i>I</i> 2 | <i>I3</i>  | <i>I4</i> |                  |
| WO |           |     |            |            | <i>I1</i> | NOP        | <i>I</i> 2 | <i>I3</i> | <i>I4</i>        |

#### **Example 15**:

II: ADD AH, BH

*I2*: SUB AH, 2

*13*: CMP AH, 0

Data hazard  $\rightarrow$ 

Consider the instructions given.

Where is/are the problem(s) with the pipeline implementation? Briefly justify your answer.

|    | 1         | 2         | 3          | 4         | 5         | 6          | 7         | Both are about to            |
|----|-----------|-----------|------------|-----------|-----------|------------|-----------|------------------------------|
| FI | <i>I1</i> | <i>I2</i> | <i>I3</i>  |           |           |            |           | access the same register AH. |
| DI |           | I1        | <i>I</i> 2 | <i>I3</i> |           |            |           |                              |
| FO |           |           | <i>I1</i>  | I2        | <i>I3</i> |            |           |                              |
| EI |           |           |            | <i>I1</i> | I2        | <i>I3</i>  |           |                              |
| WO |           |           |            |           | I1        | <i>I</i> 2 | <i>I3</i> |                              |

I1 has finished WO (t5) then can I2 do EI (t6) and I2 finished WO (t7) before I3 do EI (t8)

**Solution**:

II: ADD AH, BH

NOP

*12*: SUB AH, 2

*NOP* 

*13*: CMP AH, 0

Solution of data hazard using Delayed Load → NOP

|    |           |     |            |            |           |           |           |            | <b>▶</b> time |
|----|-----------|-----|------------|------------|-----------|-----------|-----------|------------|---------------|
|    | 1         | 2   | 3          | 4          | 5         | 6         | 7         | 8          | 9             |
| FI | <i>I1</i> | NOP | <i>I</i> 2 | NOP        | <i>I3</i> |           |           |            |               |
| DI |           | I1  | NOP        | <i>I</i> 2 | NOP       | <i>I3</i> |           |            |               |
| FO |           |     | <i>I1</i>  | NOP        | <i>I2</i> | NOP       | <i>I3</i> |            |               |
| EI |           |     |            | <i>I1</i>  | NOP       | I2        | NOP       | <b>I</b> 3 |               |
| WO |           |     |            |            | I1        | NOP       | <i>I2</i> | NOP        | I3            |

# 5

### (c) Control Hazards

(Branch Difficulty)

- Occurs when the pipeline makes the wrong decision on a branch prediction (change the PC value).
- It brings instructions into the pipeline that must subsequently be discarded.
- One of the major problems in designing an instruction pipeline is assuring a <u>steady flow of instructions</u> to the initial stages of the pipeline.



- For the pipeline to have the desired operational speedup, we must "feed it" with <u>long strings of instructions</u>.
- However, 15-20% of instructions in an assembly-level code are (conditional) branches.
- Of these, 60-70% take the branch to a target address.

- Impact of the branch is that pipeline never really operates at its full capacity:
  - → limiting the performance improvement that is derived from the pipeline.



#### **Example 16**:

## Figure shows timing diagram for 9 instruction pipeline operation with 6 stages of pipeline.

|                      |    |    | Time | e  | <b></b> |    |    |    |    |    |    |    |    |    |
|----------------------|----|----|------|----|---------|----|----|----|----|----|----|----|----|----|
|                      | 1  | 2  | 3    | 4  | 5       | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 |
| Instruction 1        | FI | DI | CO   | FO | EI      | wo |    |    |    |    |    |    |    |    |
| <b>Instruction 2</b> |    | FI | DI   | СО | FO      | EI | wo |    |    |    |    |    |    |    |
| Instruction 3        |    |    | FI   | DI | СО      | FO | EI | wo |    |    |    |    |    |    |
| Instruction 4        |    |    |      | FI | DI      | СО | FO | EI | wo |    |    |    |    |    |
| <b>Instruction 5</b> |    |    |      |    | FI      | DI | СО | FO | EI | wo |    |    |    |    |
| Instruction 6        |    |    |      |    |         | FI | DI | СО | FO | EI | wo |    |    |    |
| Instruction 7        |    |    |      |    |         |    | FI | DI | СО | FO | EI | wo |    |    |
| Instruction 8        |    |    |      |    |         |    |    | FI | DI | СО | FO | EI | wo |    |
| Instruction 9        |    |    |      |    |         |    |    |    | FI | DI | СО | FO | EI | wo |





Figure: The Effect of a Conditional Branch on Instruction Pipeline Operation



Figure: The Effect of a Conditional Branch on Instruction Pipeline Operation



Figure: The Effect of a Conditional Branch on Instruction Pipeline Operation



Figure: The Effect of a Conditional Branch on Instruction Pipeline Operation



|    | FI         | DI         | co         | FO         | EI        | wo         |
|----|------------|------------|------------|------------|-----------|------------|
| 1  | I1         |            |            |            |           |            |
| 2  | I2         | I1         |            |            |           |            |
| 3  | <b>I3</b>  | <b>I2</b>  | I1         |            |           |            |
| 4  | <b>I4</b>  | <b>I3</b>  | <b>I2</b>  | I1         |           |            |
| 5  | 15         | <b>I4</b>  | <b>I3</b>  | <b>I2</b>  | I1        |            |
| 6  | <b>I</b> 6 | <b>I</b> 5 | <b>I4</b>  | <b>I3</b>  | <b>I2</b> | <b>I1</b>  |
| 7  | <b>I7</b>  | <b>I6</b>  | 15         | <b>I4</b>  | <b>I3</b> | <b>I2</b>  |
| 8  | I15        |            |            |            |           | 13         |
| 9  | <b>I16</b> | I15        |            |            |           |            |
| 10 |            | <b>I16</b> | I15        |            |           |            |
| 11 |            |            | <b>I16</b> | I15        |           |            |
| 12 |            |            |            | <b>I16</b> | I15       |            |
| 13 |            |            |            |            | I16       | I15        |
| 14 |            |            |            |            |           | <b>I16</b> |

(a) No branches

(b) With conditional branch

Figure: An Alternative Pipeline Depiction

# 5

### **Control Hazards:** Solutions

the logic needed for pipelining to account for branches and interrupts.

Delayed branch (use of NOP instruction).

Rearranging the instructions.

#### **Example 17**: Solution for branching using delay branch (with NOP)

When the compiler detect a branch, it will automatically insert several NOP so that there is no interruptions in the pipeline.

(Assume 4 segments in a pipeline)

MOV
INC
ADD
RET I1
I1:

#### Solution:

When the compiler detect a branch, it will automatically insert several NOP so that there is no interruptions in the pipeline.

(Assume 4 segments in a pipeline)

How many NOP to insert?



= No of segment - 1

= k - 1

= 4 - 1 = 3

MOV
INC
ADD
RET I1
NOP
NOP
NOP
I1:

(Add 3 NOP)



Illustrate with space-time diagram.

Wasted 3 pipeline clock cycles

|    | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11 | 12 | 13 |
|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----|----|----|
| FI | MOV | INC | ADD | RET | NOP | NOP | NOP | I1  | I2  | I3  |    |    |    |
| DI |     | MOV | INC | ADD | RET | NOP | NOP | NOP | I1  | I2  | I3 |    |    |
| EI |     |     | MOV | INC | ADD | RET | NOP | NOP | NOP | I1  | I2 | 13 |    |
| WO |     |     |     | MOV | INC | ADD | RET | NOP | NOP | NOP | I1 | I2 | I3 |

#### **Example 18**: Solution for branching using rearranging instructions.

When the compiler detect a branch, it will rearrange the instructions so that there is no wasted pipeline.

How to arrange, moving the instruction up?



- = No of segment 1
- = k 1
- =4-1=3 (move up 3)





Illustrate with space-time diagram.

No more wasted pipeline!

|    | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8  | 9  | 10 | 11 | 12 | 13 |
|----|-----|-----|-----|-----|-----|-----|-----|----|----|----|----|----|----|
| FI | RET | MOV | INC | ADD | I1  | I2  | I3  |    |    |    |    |    |    |
| DI |     | RET | MOV | INC | ADD | I1  | I2  | I3 |    |    |    |    |    |
| EI |     |     | RET | MOV | INC | ADD | I1  | I2 | I3 |    |    |    |    |
| WO |     |     |     | RET | MOV | INC | ADD | I1 | I2 | I3 |    |    |    |

#### **Activity 3**

#### Example 19:

Consider 4 segments involved in a pipeline: *Fetch Instruction* (FI), *Decode Instruction* (DI), *Execute Instruction* (EI) and *Write Operand* (WO).

Based on the instructions given, solve the branching problem using:

(a) Delay branch (with NOP).

(b) Rearrange instructions.

AX, BX I1: MOV BX, CX I2: ADD INC I3:  $\mathbf{C}\mathbf{X}$ LOSS **I4**: **JMP** MOV DX, 1 I5: I6: INC BXI12: LOSS DEC AX

Answer a) and b) by

- Showing the NOP in the segment code
- Showing the illustration of space-time diagram

#### Solution (a):

Solution for branching using delay branch (with NOP)

Total number of NOP to be inserted:

$$= No of segment - 1$$
  
=  $k - 1$   
=  $4 - 1 = 3$ 

I1: AX, BX MOV BX, CX I2: ADD I3: INC  $\mathbf{C}\mathbf{X}$ LOSS I4: JMP *NOP* NOP*NOP* I5: MOV DX, 1 **I6**: INC BXI12: LOSS DEC AX

(Add 3 NOP)



Illustrate with space-time diagram for (a).

|    | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8   | 9   | 10  | 11  | 12 | 13 |
|----|----|----|----|----|----|----|----|-----|-----|-----|-----|----|----|
| FI | I1 | I2 | I3 | I4 | I5 | I6 |    | I12 |     |     |     |    |    |
| DI |    | I1 | I2 | I3 | I4 | I5 | I6 |     | I12 |     |     |    |    |
| EI |    |    | I1 | I2 | I3 | I4 | I5 | I6  |     | I12 |     |    |    |
| WO |    |    |    | I1 | I2 | I3 | I4 | 15  | I6  |     | I12 |    |    |

Illustrate with space-time diagram for (a).

Create 3 NOP instruction cycle before targeted label.

|    | 1  | 2  | 3  | 4  | 5          | 6   | 7   |     | 9   | 10  | 11  | 12 | 13 |
|----|----|----|----|----|------------|-----|-----|-----|-----|-----|-----|----|----|
| FI | I1 | I2 | I3 | I4 | NOP        | NOP | NOP | I12 |     |     |     |    |    |
| DI |    | I1 | I2 | I3 | <b>I</b> 4 | NOP | NOP | NOP | I12 |     |     |    |    |
| EI |    |    | I1 | I2 | 13         | I4  | NOP | NOP | NOP | I12 |     |    |    |
| WO |    |    |    | I1 | I2         | I3  | I4  | NOP | NOP | NOP | I12 |    |    |

Wasted 3 pipeline clock cycles

- Instruction pipelining is a powerful technique for <u>enhancing</u> <u>performance</u> but requires careful design to achieve optimum results with reasonable complexity.
- Discussed the principle behind instruction pipelining and how it works in practice.
- Compared and contrasted the various forms of pipeline hazards:
  - □ resource conflict,
  - □ data dependency,
  - □ control or branch difficulty.