Thursday, 27 June 2013

What's Bad about Bluespec System Verilog (part 1)

My previous post dealt with some of the good points about Bluespec System Verilog (BSV). Now it is time to describe some aspects of the language that I do not like. In this post, I'm writing about implicit conditions in BSV.

As I mentioned, I have used BSV for the past year. I used it to create components for a project called Blueshell, which is a toolkit for building networks-on-chip (NoCs). If I went into great detail about Blueshell, I'd be wandering far from my topic, but I will say that the work involved integration of IP cores such as the Microblaze CPU into a BSV framework. I have designed L1 caches and co-processors in BSV, I built a CPU debug/trace unit, I have linked Xilinx Platform Studio (XPS) projects to BSV, and at one point I even connected up an ARM Cortex-M0 CPU. So the following is not entirely based upon ignorance.

BSV has been an excellent way to build NoCs. In a single top-level file, you can write BSV code to generate the entire design procedurally, with "for" loops and functions creating the network structure, bringing in subcomponents as necessary. I shudder to think of how it would have to be done in VHDL/Verilog. I suppose that there would have to be a frontend to generate the "toplevel" VHDL from some other "NoC description language". Blegh.

But what is wrong with BSV? What doesn't it do? I will begin today's explanation with a "use case".

Blueshell includes a subsystem called Bluetree for connecting external memory (e.g. DDR3) to CPUs. The structure is (notionally) a tree, like this:

At each branch, connections from N CPUs (or subbranches) are merged to a parent - perhaps the external memory, or perhaps another branch. It's a little like how small rivers combine to form larger rivers on the way to the sea. In this case, the external memory is the final destination.

Each branch must include a multiplexer to route one of N input data streams to an output. For the sake of simplicity, suppose that N = 2. In this case, the multiplexer must work correctly in four situations:
  1. No data from either input.
  2. Data from input A; nothing from input B.
  3. Data from input B; nothing from input A.
  4. Data from both inputs.
What sort of interface should the multiplexer have? In Bluetree, the CPUs act as clients (initiating requests) while the external memory acts as a server. So, both inputs A and B can be BSV "Put" interfaces, like this:
Notice that a BSV interface is a container for methods (and subinterfaces). Each Put interface contains a put() method. Each put() method (input A or input B) is "called" from a CPU (A or B) whenever some message is ready to be sent to the external memory. put() should pass that message to the output. Here is a possible implementation:

The output is declared as a module parameter so that its methods can be "called" from mkMux2. Methods are always "called" from outside of a module; they are not a mechanism for making calls, just receiving them. So the output cannot be an interface - it must be passed in as a parameter.

This neat implementation looks like it might work, but in fact it doesn't, because it has no means for handling situation 4 (data from both inputs). In that case, which input is routed to the output? In the absence of more information, the BSV compiler decides for us, and prints a warning such as:
The compiler prioritised one input over the other (we don't know which). How do we force it to prioritise input A over input B?

Well... we can't.

It is possible to implement a scheme in which each input is periodically granted access to memory (this is called TDMA), because each put() method can involve an implicit condition, such as:
In this implementation, "a" is a register. If "a" is True, then "inA.put()" has priority. If "a" is False, then "inB.put()" has priority. The implicit conditions appear on the first line of each method definition, as "if (a)" and "if (!a)" respectively.

The value of "a" is inverted on each clock cycle (by "rule tdma"). But this means that each input gets only 50% of the available bandwidth, even when the other input is not in use.

It would be better to prioritise an input (say, input A) when both inputs are ready. However, that would require us to write an implicit condition which effectively said:
And that isn't possible, because implicit conditions cannot see whether other methods are ready to execute or not.

Why not? Well, for one thing, that information is not available within the scope. There is no accessible register or signal containing the information, though it does exist in the generated Verilog code. Here are some reasons why it may not be available in BSV:
  • The implicit conditions for one method would rely on the conditions for another, and highly complex dependencies might exist as a result.
  • Interfaces between modules would be more complex, because they would need to carry the information needed to evaluate the conditions.
  • A method might be "called" from many different places - all of the conditions for every call site would have to be brought together to determine some other method's conditions.
  • There might be cyclic dependencies! The compiler would have to check for them, which would be particularly difficult if they ran from one module to another.


So, for better or worse, the BSV designers have omitted the feature, and consequently there is no way to prioritise one method over another.
I think that variants of this issue are a big problem with BSV - to the extent that hacks have been added to the language to work around some instances of it. If two rules conflict, creating a warning like this:

then the designer can specify what should happen using a statement such as: (* descending_urgency = "gen, gen_1" *). That statement grants "gen" priority over "gen_1", if both can fire.

I greatly dislike this formulation. It's ugly. Identifiers from the code ("gen", "gen_1") have become strings, in something that looks suspiciously like a pragma or (worse) a comment. The smell of kludge fills the room.

And it's an extremely limited solution. It's no help for the use case described above, because the conflict does not occur between two rules in the same module.

For Blueshell, my solution was to bypass the multiplexer problem rather than solve it. I did it by adding a buffer. Data is delayed by one clock cycle as it moves through the multiplexer. The implicit conditions for each put() require that the buffer is empty, or will become empty at the end of the clock cycle, and separate rules send it onwards. Arguably, this is a better design anyway. Here's part of the implementation:
Note that two different rules ("outA" and "outB") move data to the output. I can't do this with a single rule, because that rule would then depend on both FIFOs being full, which is incorrect. (This is a problem even if I add another ugly BSV statement, "(* split *)".)


Furthemore, note the implicit condition on the "outB" rule. It makes the two rules mutually exclusive, prioritising "outA" over "outB". If I hadn't added this, I'd need to use the hated (* descending_urgency *) statement to prioritise A over B. With this implicit condition, I specify exactly what I want, and I can easily change the priority of the two inputs. For instance, I can extend the code to introduce a dynamic scheme where the least-recently-used input gets priority.

The FIFO buffers provide a lot more flexibility about the scheduling rules, but latency is increased, and additional FPGA space is consumed. Adding FIFOs might not always be appropriate, and in this case I would have preferred to avoid it. (I subsequently used a register rather than a FIFO; same latency and trickier preconditions, but less space is used.)

In conclusion, I would hope that future versions of BSV (or other languages inspired by it) would improve the implicit conditions, allowing them to do more. They are powerful, but not powerful enough. I would like to be able to refer to the conditions of other methods, so that priority ordering could be specified cleanly and in relation to the state of the system. I'd like them to be able to span module boundaries. And so on.

It is the elegance of the language that makes it so jarring to hit a limit and discover that something cannot be done elegantly. I believe that a better language design is possible, but I'm unsure exactly what it would look like. And all of this is far beyond VHDL and Verilog, where there is no equivalent of an "implicit condition" at all.

In the next part, I am going to write about BSV's formal system of "type classes" for polymorphic modules.