example_arm_asic_advanced:example_asic_advanced:example_arm_advanced:example_advanced:example_arm_asic:example_asic:example_arm:example
This walkthough is one of a number which help to illustrate the principles of ARM SoC design:
Standard: | Simplified: |
---|---|
ASIC example with software customisation <= | ASIC example with hardware customisation |
FPGA example | FPGA example with hardware customisation |
Altera FPGA example | Altera FPGA example with hardware customisation |
While the three standard versions include examples of custom interface hardware typical of a System-on-Chip design, the three simplified versions include non-specific input and output ports in the manner of a microcontroller system. While the use of custom interface hardware is likely to yield better results (if it doesn't yield better results you will probably be better off using a microcontroller rather than an SoC), the use of non-specific input and output ports can aid understanding and are used here as a starting point for the design of custom interface hardware.
There is also a RISC-V version for those interested in
alternative architectures.
A very simple ARM System on Chip has been designed:
The secondary images are an artefact of the partial address decoding which only uses
the top 4 address bits (HADDR[31:28]) in order to decide which slave is being accessed.
In order to build the ARM SoC, we need SystemVerilog files to model the hardware plus
'C' program files and other support files to build the software.
Further files are required to support simulation:
SoC design is a blend of hardware design and software design
including third-party components such as microprocessor cores
and library software and custom components such as
custom interfaces and application software.
When looking at the custom components, there is often a trade-off
to be made between undertaking a particular task in hardware or software.
Some tasks may be easily carried out in hardware leading to simplified
software while other tasks may be more easily carried out in software
leading to simplified hardware.
In this example we have two custom interface modules which have been designed
to integrate with the external world (in this case just switches, buttons and LEDs)
and simplify the software:
The switch interface
(ahb_switches.sv)
allows numbers to be entered via 16 switches and two buttons.
In order to enter a number, the user will set up the binary number on the switches
and then press one of the buttons.
The aim of the interface is to avoid the processing of unwanted/false data -
for example, changing the switch input from 3 to 4
may involve the following transitions:
By introducing a button for number entry, we can ensure that only valid numbers are
entered.
By introducing a second button it allows us to enter two different 16 bit numbers
using only 16 switches.
Based on this requirement, we might design an interface containing two registers:
In this case we would have no way to distinguish a series of numbers: 5,7,3
entered via Button[0] from a longer sequence including repeats: 5,7,7,7,3,3.
The more sophisticated approach used here includes a third register containing status information
combined with a handshaking protocol which is typical of a lot of custom interface harware:
This allows the processor to see if there is new data available, simply by checking
one of the bits in the Status register.
Note that this interface does not deal with switch bounce.
If switch bounce is perceived to be a problem, we could include software
which might ignore duplicate data entered in quick succession.
Alternatively debouncing hardware could be added to the switch interface.
The output interface
(ahb_out.sv)
allows data values and error status (data validity) information to be displayed via LEDs.
The aim of the interface is to ensure that the data and the data validity information is always in sync.
Based on this requirement, we might design an interface containing two registers:
With this simple approach we have a problem that, whichever register is updated first,
we will have a few clock cycles where the DataValid value doesn't correspond
to the DataOut value
(e.g. if the DataOut register is updated first, there will be a few cycles
when the old DataValid value and the new DataOut value exist together).
While this is not a problem if the output is going to some LEDs
it will often cause a problem if the output is driving some other circuit.
For this reason this interface is designed such that DataOut and DataValid
change on the same rising clock edge.
To support this operation, we will have two addressable registers
(these are the registers that are visible to the programmer):
and a third register which is not addressable:
In order to output a data value, the programmer will write a bit to the NextDataValid register
to specify the vailidity of the next data and then write the data itself to the DataOut register.
The writing of the DataOut register will trigger a copy of the NextDataValid register
to the DataValid register thus ensuring that the two outputs are always in sync:
In common with many custom interface circuits, this interface also supports a Status
register which allows the programmer to see the current status of both the DataValid
and NextDataValid bits.
The memory-mapped registers in these custom interface circuits have a variety of features:
The DataOut register in the output interface can be read or written.
The SwitchData registers in the input interface can be read but not written.
The same goes for the Status registers in either interface.
The NextDataValid register in the output interface can be written but not read.
Writing to the DataOut register in the output interface has the side effect
of changing the DataValid register and the Status register.
Similarly reading from a SwitchData register in the input interface has the side effect
of changing Status register.
When you design your own custom interface circuits you should think about the access
requirements and access side effects for the memory-mapped registers.
Running the appropriate make commands in the software directory:
Note that you can generate "code.hex", "code.vmem" and "code.lst" files at the
same time using the following make command:
(The xmverilog command to run the simulation is:
You are looking for three regions: Although the stack may be sparsely used (there are likely to be unused XXXXXXXX locations in the stack), it should still be possible to identify the larger area of unused memory between the globals and the stack. The following actions should be possible without closing the
simulation.
RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 256
(which tells the linker how much memory is provided)
with this line:
At this stage you can experiment with other changes to the
system but you should try to make only small changes between
simulations to increase the chances of being able to debug
the system.
Iain McNally
Overview of a Simple ARM SoC
16K bytes for program memory
256 bytes for data memory (including stack)
Occupying three 32-bit memory locations
Occupying two 32-bit memory locationsMemory Map
Files
Operation of the custom interface modules
0000 0000 0000 0011 (3)
0000 0000 0000 0111 (7)
0000 0000 0000 0101 (5)
0000 0000 0000 0100 (4)
including two unwanted/false data values (7 and 5).
DataOut <= HWDATA;
DataValid <= NextDataValid;
Preparation
mkdir -p ~/design/system_on_chip/example_arm_asic
cd ~/design/system_on_chip/example_arm_asic
init_arm_soc_asic_example -here
Compile C Program
cd software
make clean
make code.vmem
should create a Verilog hex format file named "code.hex"
which could be used in simulation but not in ASIC synthesis and a "code.vmem" version of the same hex data which is better suited to ASIC synthesis.
make code.lst
Examining the "code.lst" file in a text editor can be very helpful when trying to debug a system-on-chip
design in simulation.
make all
Simulate ARM SoC
cd ..
./simulate &
xmverilog +gui +access+r \
+tcl+testbench/soc.tcl \
-y behavioural +libext+.sv \
+define+prog_file_vmem=software/code.vmem \
testbench/soc_stim.sv
but it's easier to use the simulate script)
Note that because the stack grows downwards, the higher memory addresses
are used before the lower ones.
Change the program
if ( switch_temp < 8 ) {
(which limits the range of number for which a factorial is calculated)
with this line:
if ( switch_temp < 16 ) {
and re-run the make all command.
Simulation -> Reinvoke Simulator...
dialogue to re-run the simulation using the newly updated code.hex file.
Change the amount of RAM
Update the linker script
RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 1024
make clean
make all
10-8-2023