This year the task is to produce a novel general purpose microprocessor.
To demonstrate the usefulness of your design you will be asked to show that it can execute programs to perform the following tasks:
You may find that the resources for this project are rather less than those available to Intel or ARM in terms of:
For these reasons you should aim for a simple design:
Your architecture should support a 16 bit wide datapath and a 216 word address space (in this processor 1 word = 16 bits).
Your architecture should support subroutine call and return via a system stack. Your factorial program should use a recursive subroutine in order to demonstrate this facility. You should also support the use of the stack for parameter passing and local variable storage. A separate subroutine for multiplication (called by your factorial routine) should help to demonstrate these features.
The stack should be a "Full Descending" type. This means that the stack pointer will point to the topmost element on the stack and the stack pointer is decremented in order to add a new element onto the top of the stack.
Note that a dedicated stack pointer is not required in order to support this system stack. A simple RISC processor will typically use one of its general purpose registers as a stack pointer. Nor is it necessary to provide auto-decrement (auto-increment) in order to support stack push (pull). A simple RISC processor will typically use different instructions for store (load) data and decrement (increment) stack pointer.
In order to support local variable storage on the stack, you will need to
include a simple indexed addressing mode (the operation of a load instruction
using such an addressing mode is described in register transfer language as:
Rdata ←
mem( Raddress + offset )
where Rdata is the register to be loaded,
Raddress is the base register used for indexing
and offset is a literal/immediate value to determine the offset
from the base address
).
A typical processor will have a number of data registers available for data manipulation (arithmetic and logic operations) and a number of address registers available for address generation (as index register or base register). In some processors the same registers are used for both data and addresses (these are truely general purpose registers) while in other processors, advantages can be gained by having some registers which can only act as data registers and others which can only act as address registers.
All other things being equal, having more data registers available for use in arithmetic and logic operations and more address registers available for use in address generation will result in fewer memory accesses and higher processor performance. In addition to data registers and address registers (or general purpose registers which can store either data or addresses), your processor will need special purpose registers such as a program counter (PC) and an instruction register (IR).
Your processor should employ no more than eleven 16-bit registers in total, including data/address/general puropse registers and special purpose registers. Solutions with between six and eleven registers are possible. The limit is imposed primarily to avoid designs with an excessive number of special purpose registers rather than to deter inclusion of data/address/general puropse registers.
The following are not counted against this limit:
Some implemenations of interrupt context save will require significant numbers of additional registers. Processors which support interrupts may use up to seventeen 16-bit registers provided that no more than eleven of these registers are for normal (non-interrupt) operation.
In order to keep the number of memory accesses to a minimum,
arithmetic and logic operations
should be able to act on at least 3 different 16 bit data registers
(the register that will act as the stack pointer cannot be counted as one of the three).
If you choose a pure general purpose register model where there is no distinction
between data registers and address registers, arithmetic and logic operations
should be able to act on at least 5 different general purpose registers
not counting any dummy register.
Similarly you should support at least 2 different 16 bit address registers
to act as the base register, Raddress, for
indexed addressing
(one of the supported address registers must be the stack pointer pointer).
If you choose a pure general purpose register model, you should
support at least 5 different general purpose registers
to act as the base register, Raddress, for
indexed addressing (again not counting any dummy register).
This constraint is difficult to meet. You may wish to use a smaller number of registers in order to keep your design and instruction coding simple.
One of the main attributes for a good small microprocessor is the coding efficiency. This is determined by the number of bytes or words used to code a program.
For this exercise your processor should be able to perform any supported arithmetic or logic
operation on a 16-bit immediate value and the contents of one register and
place the result in a second register within three instruction words.
e.g. Rdata1 ←
Rdata2 & imm16
notes:
This constraint should not be difficult to meet. A well designed processor is likely to do better than this (especially for common operations such as addition or subtraction of a short immediate: Rdata1 ← Rdata2 - imm_short or when the operation is a simple load immediate: Rdata1 ← imm16).
The system will consist of a central microprocessor (to be designed by you), connected via a single set of buses and some glue logic to external memory (e.g. ROM, RAM) and memmory mapped I/O.
There is no restriction on the area of your design. If your chip core plus pad wiring is smaller than about 770um by 680um then it will fit into the default pad ring without modification. If your chip core is larger you can easily extend the pad ring with the addition of spacer cells between the pads.
Typically one design from each year's course is fabricated. In selecting the design to be fabricated, both the quality of the design and the efficiency of its implementation (i.e. it's area) will be considered.
As much as possible you are left to make your own decisions on the form of your design.
Hopefully the research phase of this project will give you a good starting point for attempting this design. Further research is likely to be required as your ideas develop. Although your design may have more in common with microprocessors of the late 1970s and 1980s than with modern microprocessors, the same principles of good design apply. Remember that you are not competing with the a multi-threaded, multi-core, superscalar, super-pipelined microprocessor; you are only competing with the designs put forward by other teams undertaking this course.
All memory accesses take four cycles of the system clock
Note that it is this four cycle memory access that precludes the use of pipelining in your design. No benefit can be gained from pipelining here without the use of an on-chip memory cache which is not within the scope of this project.
The nWait signal is used to lengthen the memory cycle when accessing slow peripherals. This input signal is normally high. It may be taken low by a slow peripheral during the Data Setup cycle. Should this occur, the microprocessor should remain in the Data Setup cycle until nWait becomes high again.
The nIRQ signal is an active low, level sensitive, interrupt request input. Support for this interrupt facility is an optional extra, allowing you to show off your talents.
Notes
Support for interrupts is technically and conceptually difficult. Although any team may opt to attempt to implement interrupt support, if you fail to keep up with the design milestones for the interrupts you will have to remove interrupt support from your design.
Pad cells for this design will be taken from On Semiconductor's amis350ucapta library for their 0.35um CMOS process. On Semiconductor provide us with empty abstract representations of these cells since the contents are considered company confidential. You will be provided with dummy versions of the cells which should simulate correctly using Magic and Verilog. To avoid design rule violations with the real pad cells you should keep routing wires one full design rule away from the edge of the pad cells except where connection is required.
The following types of pad are available:
The following pads will exist on the finished chip:
These pads make no direct connection to the core circuit. This will reduce switching noise on the core power rails.
These pads include a non-inverting buffer and static protection circuit.
These pads include a tri-state non-inverting buffer pad driver. Although these pads support tri-state operation via an active low enable signal, this feature is not needed and the relevant enable signals will be held low.
These pads include a tri-state non-inverting buffer (with inverted enable) on output and a non-inverting buffer on input. Three connections are made to the core of the cell in order to support a bi-directional pad. Although the pad connection is tri-state, the core connections are not, thus the QC and A connections cannot be directly connected to the same internal bus.
Iain McNally
29-1-2014