B.A.(Mod.) Computer Science
Junior Sophister Examination

Trinity Term 1998

3BA4 - Computer Architecture II

Thursday 4th June
Luce Hall
14.00 - 17.00

Dr. J. Jones, Dr. A. Butterfield

Attempt FIVE questions, at least TWO from each section.
Use separate answer books for each section.

SECTION A

1. What is a pipelined processor? What are the benefits of pipelining? Explain the organization and operation of the DLX five stage execution pipeline.

What effect would a "simple" implementation of branch instructions have on the DLX pipeline? Show how branch prediction can be used to speed up the execution of branch instructions. Compare the effective cycles per instruction (CPI) of a branch prediction scheme with a "simple" branch implementation (assume 15% branch instructions, 90% probability of hitting the branch target buffer, 90% probability of a correct prediction and a 1 cycle penalty if the branch target buffer needs updating).

2. What are the differences between "virtual" and "physical" addresses? How are virtual addresses converted to physical addresses by an i486 style memory management unit? Show the advantage(s) of using an n-level page table structure by comparing the amount of physical memory needed for the page tables of a small and a maximum sized process.
What is a TLB? Describe the organisation and operation of the TLB inside an i486 style memory management unit. How and why do TLB entries need to be invalidated? What is the effect of a user process switch on the contents of the TLB? How can the operating system and user processes share the TLB?

3. What is a cache? How does a cache reduce the effective memory access time? Explain how a cache organisation can be characterised by the three constants LKN. Explain in detail how a data item is searched for in an LKN cache. What special names are given to cache organisations where (i) \( N=1 \) (ii) \( K=1 \) and (iii) \( K=4 \).

Would you expect a 2-way cache to *always* outperform a 1-way cache of equal size? Identify a sequence of addresses which produce, for equally sized caches, more misses for a 1-way cache than a 2-way cache (LRU replacement policy) and then a sequence of addresses which produce more misses for the 2-way cache than the 1-way cache. Explain the reasoning behind your address sequences.

4. What is the cache coherency problem? Under what conditions are the caches in a system considered to be coherent?

Explain (i) the meaning of the 4 cache line states used in the write-once cache coherency protocol and (ii) the basic operation of the protocol. Given a 2 CPU+cache multiprocessor system, illustrate the bus traffic and cache state transitions that would occur if the following CPU memory requests are issued:

- CPU 0: read \ a0
- CPU 1: read \ a0
- CPU 0: write \ a0
- CPU 0: write \ a0
- CPU 0: write \ a0
- CPU 1: read \ a0

What advantage does the write-once protocol have over the simpler write-through scheme?
5. Consider the following Switch Circuit:

(i) explain how it does not conform to the rules for designing standard CMOS switch circuits.
(ii) Produce a Stick Diagram of this circuit, subject to the requirement that both inputs enter from the RIGHT on Polysilicon, and the output exits on the bottom, also on Polysilicon.
(iii) In your opinion, what logic function, if any, does the switch circuit implement?

6. (i) For the following types of design rules: width, separation, overlap and extension, describe the nature of the rules and explain what manufacturing problems they are designed to solve.
(ii) A particular separation rule requires that contact cuts be a certain minimum distance from transistor gate regions. What is the reason for this?
(iii) Some electrical design rules are concerned with keeping current densities at low limits. How do these rules apply to contact cuts, and what does this mean for the layout of contact cuts designed to handle large currents? Illustrate your answer with an example.
7. Consider a process technology where the effective resistance and load capacitance of a minimum-sized inverter are 10kΩ and 10fF respectively.
   (i) Design a chain of inverters to drive a load of 1000fF as fast as possible.
   (ii) Design a 4000μm long, 2μm wide line with inverters added at intervals, to be as fast as possible, given that the line has the following characteristics: sheet resistance: 10Ω/square, Capacitance/Unit Area: 0.1fF/μm².

8. (i) Explain the term “pseudo-nMOS” and show how such is used to produce an n-input NOR gate.
   (ii) Show how pseudo-nMOS can be used to produce regular programmable logic arrays (PLA).
   (iii) Sketch out a PLA to implement the following logic:
       \[ X = A+/B, \quad Y = /A(B+C), \quad Z = /AB+AC+A, \]
       where A, B and C are inputs and X, Y and Z are outputs.
       Your answer should show how the PLA building blocks are put together and the effect of the programming.

©University of Dublin 1998