Optimal Program Partitioning for Predictable Performance

Monday, 16 July 2012

Last week I travelled to Pisa for the ECRTS 2012 conference, where I presented a paper on partitioning programs for use with scratchpad memory (SPM).

Here are the relevant materials: the PDF, the slides, and the software.

Not the Leaning Tower, which I did not bother to climb (€15 hardly
seemed worth it). This is the Medici Aqueduct of Asciano, a major
achievement of Renaissance engineering, which supplied
clean water to Pisa for hundreds of years.

After feedback from a practice presentation for the RTS group in York, I decided to refocus the presentation on the reason for using SPM, namely that pipelined memory accesses mean it is faster to fetch a block of code (e.g. a method) from external memory if you know exactly how much code will need to be fetched.

Using a cache, code is fetched "on demand", one line at a time. Every fetch incurs an overhead (the bus latency). Whereas using an SPM, code is fetched on a region transition between one SPM allocation and another. A single fetch gets everything. So the overhead occurs once, instead of once per fetch. If you have suitable hardware, and a good quality SPM allocation algorithm, you can make use of this to improve the performance of programs.

I thought the conference was valuable because I got to discuss the work with a number of people, and I'm grateful for their feedback. One significant subject of discussion was the importance of somehow supporting multiple tasks, which is the subject of my other work. The two parts go together - one could be used within a task to allocate SPM space; the otherto share SPM space between tasks.