This year the task is to produce a novel general purpose microprocessor.
To demonstrate the usefulness of your design you will be asked to show that it can execute programs to perform the following tasks:
You may find that the resources for this project are rather less than those available to Intel or ARM in terms of:
For these reasons you should aim for the simplest design possible:
Your architecture should support a 16 bit wide datapath and a 216 word address space (in this processor 1 word = 16 bits).
Your architecture should support subroutine call and return via a system stack. Your factorial program should use a recursive subroutine in order to demonstrate this facility. You should also support the use of the stack for parameter passing and local variable storage. A separate subroutine for multiplication (called by your factorial routine) should help to demonstrate these features.
The stack should be a "Full Descending" type. This means that the stack pointer will point to the topmost element on the stack and the stack pointer is decremented in order to add a new element onto the top of the stack.
Note that a dedicated stack pointer is not required in order to support this system stack. A simple RISC processor will typically use one of its general purpose registers as a stack pointer. Nor is it necessary to provide auto-decrement (auto-increment) in order to support stack push (pull). A simple RISC processor will typically use different instructions for store (load) data and decrement (increment) stack pointer.
In order to support local variable storage on the stack, you will need to
include a simple indexed addressing mode (the operation of a load instruction
using such an addressing mode is described in register transfer language as:
Rdata ←
mem( Raddress + offset )
where Rdata is the register to be loaded,
Raddress is the base register used for indexing
and offset is a literal/immediate value to determine the offset
from the base address
).
All other things being equal, having more data registers available for use in arithmetic and logic operations and more address registers available for use in address calculation will result in fewer memory accesses and higher processor performance. In addition to data registers and address registers (or general purpose registers which can store either data or addresses), your processor will need special purpose registers such as a program counter (PC) and an instruction register (IR).
Your processor should employ no more than eleven 16-bit registers in total, including data/address/general puropse registers and special purpose registers. Solutions with between four and eleven registers are possible. The limit is imposed primarily to avoid designs with an excessive number of special purpose registers rather than to deter inclusion of data/address/general puropse registers.
The following are not counted against this limit:
Some implemenations of interrupt context save will require significant numbers of additional registers. Processors which support interrupts may use up to sixteen 16-bit registers provided that no more than eleven of these registers are for normal (non-interrupt) operation.
One of the main attributes for a good small microprocessor is the coding efficiency. This is determined by the number of bytes or words used to code a program.
For this exercise your processor should be able to load any 16-bit immediate value into a register within two instruction words (either one instruction that is two words long or two single word instructions). Similarly you should be able to add any 16-bit immediate value to the contents of a register within two instruction words.
Ideally the loading or adding of shorter immediate values will be possible with just one single word instruction (this will depend on the approach that you take to the coding).
This constraint is difficult to meet. None of the example processors meet all of these measures for coding efficiency. If you have to compromise on any of these coding efficiency constraints, you should ensure that you are able to load any 16-bit number into a register within two instruction words.
The system will consist of a central microprocessor (to be designed by you), connected via a single set of buses and some glue logic to external memory (e.g. ROM, RAM) and memmory mapped I/O.
There is no restriction on the area of your design. If your chip core plus pad wiring is smaller than about 770um by 680um then it will fit into the default pad ring without modification. If your chip core is larger you can easily extend the pad ring with the addition of spacer cells between the pads.
Typically one design from each year's course is fabricated. In selecting the design to be fabricated, both the quality of the design and the efficiency of its implementation (i.e. it's area) will be considered.
As much as possible you are left to make your own decisions on the form of your design.
Hopefully the research phase of this project will give you a good starting point for attempting this design. Further research is likely to be required as your ideas develop. Although your design may have more in common with microprocessors of the late 1970s and 1980s than with modern microprocessors, the same principles of good design apply. Remember that you are not competing with the a multi-core, superscalar, super-pipelined microprocessor; you are only competing with the designs put forward by other teams undertaking this course.
All memory accesses take four cycles of the system clock
Note that it is this four cycle memory access that precludes the use of pipelining in your design. No benefit can be gained from pipelining here without the use of an on-chip memory cache which is not within the scope of this project.
The nWait signal is used to lengthen the memory cycle when accessing slow peripherals. This input signal is normally high. It may be taken low by a slow peripheral during the Data Setup cycle. Should this occur, the microprocessor should remain in the Data Setup cycle until nWait becomes high again.
The nIRQ signal is an active low, level sensitive, interrupt request input. Support for this interrupt facility is an optional extra, allowing you to show off your talents.
Notes
Pad cells for this design will be taken from On Semiconductor's amis350ucapta library for their 0.35um CMOS process. On Semiconductor provide us with empty abstract representations of these cells since the contents are considered company confidential. You will be provided with dummy versions of the cells which should simulate correctly using Magic and Verilog. To avoid design rule violations with the real pad cells you should keep routing wires one full design rule away from the edge of the pad cells except where connection is required.
The following types of pad are available:
The following pads will exist on the finished chip:
These pads make no direct connection to the core circuit. This will reduce switching noise on the core power rails.
These pads include a non-inverting buffer and static protection circuit.
These pads include a tri-state non-inverting buffer pad driver. Although these pads support tri-state operation via an active low enable signal, this feature is not needed and the relevant enable signals will be held low.
These pads include a tri-state non-inverting buffer (with inverted enable) on output and a non-inverting buffer on input. Three connections are made to the core of the cell in order to support a bi-directional pad. Although the pad connection is tri-state, the core connections are not, thus the QC and A connections cannot be directly connected to the same internal bus.
Iain McNally
21-1-2011