Intro from Jan 2023

This post was first drafted in March 2020, before the COVID-19 pandemic started. At that time I was developing the Parachute emulator and toolchain during my commute to and from work. Working from home changed this regular development opportunity, and I put the project on hold, while I got up to speed learning a new language, Rust, and started the digimorse project. I’m quite close to the first major milestone of digimorse, and have picked Parachute up again. I’m in the middle of moving it to a different build server, and continuing where I left off.

In addition to the emulator, I was also considering building my own Parachute hardware in FPGA, and this post – more a series of rough notes than a proper article – detailed my investigations at that time. I’ve added some comments from Jan 2023.

One effect of the pandemic was the chip/hardware shortage and general supply-chain problems: The components I was considering have now become quite difficult to source, and the cost has risen. The situation is improving though, and I hope to obtain the development board I was considering – the CMOD-A7-35T – in the near future. In the meantime, I’ve obtained a cheap-but-small board – the iCEstick – in order to get acquainted with the technology. (I have another hardware variant in mind, but I’ll leave that for another post…)

Article from 2020

The next few steps along software side of Parachute are planned out, but my intention is to evolve it to incorporate hardware too. Running a Transputer simulator is fine, but I’d like real hardware to be able to run parallel code on. Actual Transputer chips and boards are available on eBay, but they are quite costly. So the alternative is to build a Transputer in a hardware description language (HDL) such as VHDL or Verilog, and synthesise it into a field programmable gate array (FPGA).

So I’m off on another steep learning curve again. Actually, several:

  • How microprocessors work internally; revisiting digital electronics and extending my understanding considerably. Learning about building datapaths and control units. Ben Eater has a superb series of videos on his YouTube channel that explain his 8-bit TTL microprocessor. He also recommended the first of these books:
    • Malvino and Brown’s “Digital Computer Electronics” 3rd ed.
    • I’m also learning from… Tanenbaum’s “Structured Computer Organization” 3rd ed.,
    • Zaks’ “From Chips To Systems”
    • Heuring and Jordan’s “Computer Systems Design and Architecture” 2nd ed.
    • East’s “Computer Architecture and Organization” which features the Transputer; Ian East has also written other books on it, such as “Parallel Processing with Communicating Process Architecture”
  • Microprogramming – early processors were built with ‘random logic’, for example: Ben Eater’s processor, the Gigatron TTL processor, etc. Microprogramming is a “… technique for designing digital systems which … [is] .. suitable for designing systems of any degree of complexity. … If we take an ordinary microprocessor we can program it in a high level language such as BASIC, we can also program it in assembler which is, in a sense, a lower level language. However, if we were able to look inside the chip itself we would usually find that each instruction, at the assembler level, is itself executed as a sequence of instructions at an even lower level. These lower level instructions are microinstructions. A sequence of microinstructions will form a microprogram. … In the same way that a microprocessor can be used to replace a logic design by a program, microprogramming can also be seen as replacing a logic design by a type of program.” – From “Microprogrammed Systems Design” by J.S. Florentin. According to the Wikipedia page for the Transputer, “the transputer’s core logic was simpler than most CPUs. While some have called it reduced instruction set computer (RISC) due to its rather sparse nature, and because that was then a desirable marketing buzzword, it was heavily microcoded,”
  • Hardware description languages such as VHDL or Verilog. However, I have been recommended to look at SpinalHDL by another Transputer enthusiast. SpinalHDL is an internal DSL using Scala as its host, producing VHDL or Verilog as its output. I’m hoping to use this, and that it can feed into the toolchains of the FPGA vendors I choose. I have done an initial recce into whether it can be used from Maven rather than SBT (yes, it can, so far). I’m also working through Dally, Harting and Aamodt’s Digital Design Using VHDL: A Systems Approach, and Blaine C. Readler’s Verilog by Example, after the recommendation in Bruno Levy’s learn-fpga tutorial.
  • FPGAs, their structure, use and limitations.

FPGA Choices

Which FPGA to choose? There are many, with a range of facilities and prices. All prices quoted here are current as of March 2020.

ICE40

I’m tempted by the ice40 variants, such as the BlackIce-mx so that I can use an open source toolchain, YoSys by Claire Wolf. I read Øyvind Teig‘s extensive articles My FPGA Notes and My IceCore Notes which are pursuading me that this would be a good choice. They are £48.49.

In Jan 2023, the small iCEstick ICE40HX1K-STICK-EVN boards are available and cheap. I have ordered one from Digikey, for £41.04. This device is probably too small to host a full Transputer (it has 1280 LUTs), but is cheap and available, has a Pmod socket, will work with the open source toolchains, and will get me started. See below..

There’s also the icoBoard, based on the ICE40 with 8K LUTS at £90.45 (in Jan 2023).

Lattice MachXO2

I read an article by George Smart M1GEO, IV-16 Numitron Clock, in which he uses a small, inexpensive FPGA to control a clock that uses Numitron tubes as its display. The circuit to add JTAG and power to the FPGA is simple. George told me that the chip in question is the Lattice MachXO2 which has a breakout/development board that’s £23.45, with the chips themselves (the Lattice LCMXO2-7000HE-4TG144C) at £11.23 each. One disadvantage of this development board is that it doesn’t support Pmod boards. The 7000HE has 6,864 LUTs, 54kbits of distributed RAM, 250kbits of embedded block RAM, 256kbits of user flash memory, 114 I/O pins (108 available on the breakout board headers), and a maximum operating frequency of 269 MHz (50MHz oscillator on the breakout board).

For programming in-circuit, George says that a generic AliExpress JTAG programmer unit works, although he uses the FTDI FT2232H cable that supports JTAG, and costs around £30. I found some useful information on these. I also found a forum post that suggested that the FTDI C232HM-DDHSL cable would be suitable “However for some devices you may need a tiny bit of capacitance, around 18pF, added between the TCK pin and ground.” – advice that George also suggested: “Put a 22pF cap on TCK, as this helps a lot!”. According to this cable’s data sheet, there are two variants, one with 5.5V @ 450mA on VCC (the C232HM-EDHSL-0), the other with 3.3V @ 250mA (the C232HM-DDHSL-0). The 7000HE variant FPGA “only accepts 1.2V as the external VCC supply voltage”, has a static supply current requirement of 12.87mA and a programming/erase flash supply current requirement of 33.2mA. I’d have to regulate the 3.3V from the C232HM-DDHSL-0 cable down to 1.2V to feed all the VCC pins. (George’s circuit uses a 3.3V HC variant of the FPGA, which he obtains from the USB +5V via a AZ1117E-3V3 regulator – and there’s a 1.2V variant of this also.) The C232HM-DDHSL-0 costs £30.69.

Altera MAX10

There is this development board which currently costs £32.86 from Tindie. The MAX10 on here has 4,000 Logic Elements and 200Kbits of Memory.

ARTY

The Digilent ARTY-A7-100T is a very feature complete board, and for $249, it had better be! Out of my price range, but has many features I’d like: over 100,000 logic elements, many LEDs, switches, Pmod connectors, Arduino headers, internal clock speeds in excess of 450MHz, 256MB of DDR3L RAM.. you name it, it’s got it. This board has the capability to accommodate a processor, and to prove it, there’s an article that ports the SiFive Freedom E310 RISC-V core to run on the board. This board does have a lot of good training material available.

In Jan 2023, it’s £233.66 from Digikey, and is available.

ARTY CMOD-A7

The Digilent ARTY CMOD-A7 is much cheaper, around £70 in Sep 2021, and with the A7-35T having 20,800 logic elements, should be capable of implementing a simple processor. Even if it could not contain a complete T800 with 4 links, it would be a good starter board.

In mid-Jan 2023, these are available from DigiKey at £82.51 and available from Mouser at £90.60.

Basys 3

From the same stable, the Basys 3 board is quite feature complete, but at $149, still a bit more expensive than I’d like. ~ 33,000 logic elements.

ZedBoard

Another Digilent board, the ZedBoard is $449. It was used by the OpenTransputer project, and also for implementing a Transputer using the SME HDL by Johnsen, Skovhede et al, see below.

Significant Factors

The major factors influencing my choice are a) ease of use b) quantity of LUTs, c) other provided features and d) low cost. There are quite a few boards available that provide many features, connectors, interfacing circuitry – and also considerably higher cost. Since I’m a beginner with FPGAs, and only an amateur with electronics, I need something that won’t break the bank initially, and won’t be too expensive when I inadvertantly release its magic smoke by miswiring it.

How many LUTs will I need?

I don’t know. There are several related earlier projects I’m looking at to get an idea of what I should aim for:

  • Catherine Keane and David May’s Compiling occam into silicon, in which a transputerette is developed, containing an overview of the microarchitecture needed to support sequential occam programs, and a minimal interpreter of the machine code for this.
  • Barry Cook and Roger Peel’s Occam on FPGAs – Steps towards the Para-PC – although this did not realise a Transputer in FPGA, rather using occam as a HDL targetting FPGAs.
  • Andrs Amaya Garca, David Keller and David May, OpenTransputer – Reinventing a Parallel Machine from the Past. From Communicating Process Architectures 2015. Does not state how many LUTs they needed, however they targetted the XC7Z020-CLG484 FPGA which has 85K programmable logic cells, 53,200 LUTs (from its datasheet).
  • Uwe Mielke, Martin Zabel and Michael Bruestle’s T42 – Transputer in FPGA – presented at Communicating Process Architectures 2018 – see presentation PDF. [Note from 2023, these links to CPA 2018 seem to be missing now] An open source FPGA transputer in VHDL – although I haven’t found the open location of this yet. They target the Artix ARTY 7 and require 12,000 LUTs. Towards the end of the paper, the authors summarise other attempts (some of which I cite here) – and also other processors available in FPGA soft-cores, and their LUT requirements. For the T42 – a T425 implementation – they quote 1926-2147 LUTs for the CPU and 1600-1700 LUTs for four links.
  • Carl-Johannes Johnsen, Kenneth Skovhede, Brian Vinter, Lindsay Brian O’Quarrie, Lawrence J. Dickson – Implementing a Transputer in FPGA in less than 800 lines of code, also presented at Communicating Process Architectures 2018 – see presentation PDF. Using Synchronous Message Exchange (SME) as a HDL – positing the view that this is a far better match for a design such as the Transputer than VHDL or Verilog. Of T42, the authors claim “going from design to implementation and the verification of the implementation is a lengthy and complex process”. They also cite the OpenTransputer project, in Verilog “which is why it has the same challenges as the T42”. What are the qualities of SME that relieve it of this difficulty? Are they suggesting that verification is unnecessary, or could be made much simpler? Barry Cook and Roger Peel’s paper above claims that “Our experience shows that correct occam programs execute first time in hardware.”. For this project, I’m investigating SpinalHDL as this seems easier to work with than raw VHDL or Verilog; it seems to provide a more direct description of the logic it implements. It remains to be seen whether this removes or mitigates the flaws this paper’s authors see in the implementation of other projects. I have insufficient experience of hardware design, in any HDL, or occam, to evaluate this, at this time. [Added in Jan 2023] Some of the authors of this paper have also written SME: A High Productivity FPGA Tool for Software Programmers – summarising, SME is based on CSP, hence providing a similar substrate as occam, as favoured by Barry Cook and Roger Peel’s paper.
  • The aforementioned port of the SiFive Freedom E310 core – This is quite a different architecture to the Transputer, so does knowledge of its LUT count give any indication of how many LUTs are required for a Transputer (even one of the earlier ones)?
  • [Added in Jan 2023] The learn-fpga project by Bruno Levy contains FemtoRV, “a minimalistic RISC-V CPU” that fits onto the iCEstick at < 1280 LUTs. This gives me hope that I might be able to build something approaching a transputerette on the iCEstick.

The iCEstick Evaluation Board

Notes on the iCEstick evaluation board: [user manual] [product highlight] [Pmod™Peripheral Modules] [Lattice Semiconductor iCEstick page] [Lattice iCE40 FPGA page] [page detailing core components].

DigiKey has a nice series of introductory videos on FPGA, by Shawn Hydel, which I’m working through.

There’s also a non-free, but quite inexpensive course from VHDLwhiz.

Development Tools – Closed Source

Development is done (using the closed source tools) iCEcube2™ and Diamond Programmer. The software is zero-cost, but not Free. An account is required on Lattice’s website, and a zero-cost license is sent that’s tied to the MAC address of your development system. The latest version runs on Linux and Windows. Unfortunately, not on macOS, which is my primary environment. Also, the software doesn’t run on recent Ubuntu 20.04 without some fuss, but it looks like it can be made to work.

I’m using it on Windows (for my sins). After some initial frustration, Lattice’s iCECube2 software finds the board, but it seems sensitive to the USB port that the board is plugged into. When it states “multiple cables found”, I think this possibly means “there are multiple USB ports on which it could search for a board, which one have you plugged the iCEstick into, here are a set of non-obvious choices”.

I’ve now got its settings and port right. I’m going to continue with this system for now.

Development Tools – Open Source

[Project IceStorm] [APIO]

I haven’t managed to get the open source APIO/YoSys/IceStorm toolchain working fully yet on latest Ubuntu. APIO is a “wrapper” that abstracts a large number of board types, and makes the command line interface much more straightforward by calling the low-level underlying tools for you. However, it doesn’t find the board, and complains with “Error: board icestick not found”. The underlying programming tool iceprog can be directly called – I’ve successfully had it detect the board and display its flash ID. So this may be a bug in the APIO wrapper. I’ll investigate this and submit a pull request if I find a fix.

Next steps…

Working through Shawn Hymel’s course, getting APIO working, learning VHDL/Verilog, re-reading Structured Computer Organisation. Plenty to keep me busy!

Advertisement