VLSI Design Project 2010/2011 - Design Phase

Microprocessor Specification

Overview

Task
This year the task is to produce a novel general purpose microprocessor.
Functionality
To demonstrate the usefulness of your design you will be asked to show that it can execute programs to perform the following tasks:
Strategy
You may find that the resources for this project are rather less than those available to Intel or ARM in terms of:
- Time
- Manpower
- Experience
- Chip area
For these reasons you should aim for the simplest design possible:
- Small number of instructions
- Small number of addressing modes
- Simple instruction decoding
- Simple architecture
- Small number of registers
- Simple instruction cycle
- Minimum bus count
- Highly bitsliced implementation

Constraints

16 bit architecture
Your architecture should support a 16 bit wide datapath and a 2¹⁶ word address space (in this processor 1 word = 16 bits).
System Stack and Subroutines
Your architecture should support subroutine call and return via a system stack. Your factorial program should use a recursive subroutine in order to demonstrate this facility. You should also support the use of the stack for parameter passing and local variable storage. A separate subroutine for multiplication (called by your factorial routine) should help to demonstrate these features.
The stack should be a "Full Descending" type. This means that the stack pointer will point to the topmost element on the stack and the stack pointer is decremented in order to add a new element onto the top of the stack.
Note that a dedicated stack pointer is not required in order to support this system stack. A simple RISC processor will typically use one of its general purpose registers as a stack pointer. Nor is it necessary to provide auto-decrement (auto-increment) in order to support stack push (pull). A simple RISC processor will typically use different instructions for store (load) data and decrement (increment) stack pointer.
In order to support local variable storage on the stack, you will need to include a simple indexed addressing mode (the operation of a load instruction using such an addressing mode is described in register transfer language as:
R_data ← mem( R_address + offset )
where R_data is the register to be loaded, R_address is the base register used for indexing and offset is a literal/immediate value to determine the offset from the base address ).
Number of 16-bit Registers
All other things being equal, having more data registers available for use in arithmetic and logic operations and more address registers available for use in address calculation will result in fewer memory accesses and higher processor performance. In addition to data registers and address registers (or general purpose registers which can store either data or addresses), your processor will need special purpose registers such as a program counter (PC) and an instruction register (IR).
Your processor should employ no more than eleven 16-bit registers in total, including data/address/general puropse registers and special purpose registers. Solutions with between four and eleven registers are possible. The limit is imposed primarily to avoid designs with an excessive number of special purpose registers rather than to deter inclusion of data/address/general puropse registers.
The following are not counted against this limit:
- dummy registers such as used in many RISC machines (which always contain zero and do not need storage)
- individual flag registers such as C, N, V, Z
Some implemenations of interrupt context save will require significant numbers of additional registers. Processors which support interrupts may use up to sixteen 16-bit registers provided that no more than eleven of these registers are for normal (non-interrupt) operation.
Coding Efficiency
One of the main attributes for a good small microprocessor is the coding efficiency. This is determined by the number of bytes or words used to code a program.
For this exercise your processor should be able to load any 16-bit immediate value into a register within two instruction words (either one instruction that is two words long or two single word instructions). Similarly you should be able to add any 16-bit immediate value to the contents of a register within two instruction words.
Ideally the loading or adding of shorter immediate values will be possible with just one single word instruction (this will depend on the approach that you take to the coding).
This constraint is difficult to meet. None of the example processors meet all of these measures for coding efficiency. If you have to compromise on any of these coding efficiency constraints, you should ensure that you are able to load any 16-bit number into a register within two instruction words.
System compliance
The system will consist of a central microprocessor (to be designed by you), connected via a single set of buses and some glue logic to external memory (e.g. ROM, RAM) and memmory mapped I/O.

Chip Area
There is no restriction on the area of your design. If your chip core plus pad wiring is smaller than about 770um by 680um then it will fit into the default pad ring without modification. If your chip core is larger you can easily extend the pad ring with the addition of spacer cells between the pads.
Typically one design from each year's course is fabricated. In selecting the design to be fabricated, both the quality of the design and the efficiency of its implementation (i.e. it's area) will be considered.

As much as possible you are left to make your own decisions on the form of your design.

Hopefully the research phase of this project will give you a good starting point for attempting this design. Further research is likely to be required as your ideas develop. Although your design may have more in common with microprocessors of the late 1970s and 1980s than with modern microprocessors, the same principles of good design apply. Remember that you are not competing with the a multi-core, superscalar, super-pipelined microprocessor; you are only competing with the designs put forward by other teams undertaking this course.

Memory Access

All memory accesses take four cycles of the system clock

Address setup
Address hold
Data Setup
Data Hold

The ALE controls the address latch such that address is available through all four cycles.
nME is gated in the Address decoder to provide separate nCE lines to each memory device. The nCE lines control the timing of data access operations.
Common RnW and nOE lines control the direction of data transfer.

Note that it is this four cycle memory access that precludes the use of pipelining in your design. No benefit can be gained from pipelining here without the use of an on-chip memory cache which is not within the scope of this project.

Accessing Slow Memory

The nWait signal is used to lengthen the memory cycle when accessing slow peripherals. This input signal is normally high. It may be taken low by a slow peripheral during the Data Setup cycle. Should this occur, the microprocessor should remain in the Data Setup cycle until nWait becomes high again.

Interrupt

The nIRQ signal is an active low, level sensitive, interrupt request input. Support for this interrupt facility is an optional extra, allowing you to show off your talents.

Notes

You may choose to implement nIRQ as a falling edge triggerred interrupt to simplify your processor design. This will mean that your microprocessor will be able to cope with just one interrupt source.
Since the nIRQ signal is an asynchronous input you should retime the signal through two D-types, before using it within your control unit, in order to reduce susceptiblity to the problems of metastability.

I/O information

Pad cells for this design will be taken from On Semiconductor's amis350ucapta library for their 0.35um CMOS process. On Semiconductor provide us with empty abstract representations of these cells since the contents are considered company confidential. You will be provided with dummy versions of the cells which should simulate correctly using Magic and Verilog. To avoid design rule violations with the real pad cells you should keep routing wires one full design rule away from the edge of the pad cells except where connection is required.

The following types of pad are available:

The following pads will exist on the finished chip:

Core power pads; 0Vcore, 3.3Vcore
Pad ring power pads; 0Vpads, 3.3Vpads
These pads make no direct connection to the core circuit. This will reduce switching noise on the core power rails.
Input pads; Clock, nReset, Test, SDI, nIRQ, nWait
These pads include a non-inverting buffer and static protection circuit.
Output pads; RnW, nOE, ALE, nME, SDO
These pads include a tri-state non-inverting buffer pad driver. Although these pads support tri-state operation via an active low enable signal, this feature is not needed and the relevant enable signals will be held low.
Bi-directional pads; Data[15:0]
These pads include a tri-state non-inverting buffer (with inverted enable) on output and a non-inverting buffer on input. Three connections are made to the core of the cell in order to support a bi-directional pad. Although the pad connection is tri-state, the core connections are not, thus the QC and A connections cannot be directly connected to the same internal bus.

Pad Arrangement and DIL40 Pinout

Iain McNally

21-1-2011