Relative speed of Raspberry Pi, Pi 2, and desktop PC (x86 and AMD64)
Sunday, 26 April 2015The last two posts described my attempts to build a benchmark program from the Quake 2 source code, to complement a similar program I made from the Doom source code. The main issues have been related to floating-point numbers; much of the effort has gone into discovering why the rendered output from Quake 2 was slightly different with different GCC configurations, different CPUs and different math libraries.
However, I have now completed the work, and uploaded it here. The floating-point problems are resolved by (1) forcing the use of SSE on x86 platforms, and (2) bundling a math library with the benchmark. I chose openlibm for this purpose. The approach has the additional benefit that SSE will be used within the math library. x86 Linux math libraries typically use the x87 FPU, with all of its associated issues.
The benchmark allows me to make a speed comparison of a few different platforms that I have access to. None of these results are particularly surprising, but I think it is interesting to quantify the difference between these platforms.
Of course, this isn't SPEC or EEMBC. The experiments aren't rigorous. These figures are only a rough guide to the relative speed of those platforms when running integer-only code (Doom) or mixed integer/floating-point code (Quake 2). Here's the data:
|Arch||Model||GCC version||Absolute Time (s)||Relative Time|
|Quake 2||Doom||Quake 2||Doom|
|x86||Core i3 3220||4.7.4||25.4||7.5||19.0||28.9|
|x64||Core i3 3220||4.1.2||20.9||6.9||23.1||31.5|
Both benchmarks operate as follows: the game is started in a non-interactive "headless" mode in which the graphical output is rendered to a memory buffer only. The game plays through a demo file that completes the game: Doom Done Quick, or Quake 2 Done Quick 2. These demo files would take about 20 minutes to play in real time; in headless mode, they complete in under a minute on the PC platforms. The benchmark renders every frame at a fixed frame rate (12.5 fps for Quake 2, 35 fps for Doom). We check for correct rendering using a CRC-32 check of the data in the buffer: however, this check does not affect benchmark timing, because it is enabled only while testing the benchmark.
The output time is shown in seconds. We are looking at four different systems. Of these, the original Raspberry Pi (RPi) is slowest, taking 483.2s to run Quake 2 and 217.1s to run Doom. These timings are used as a baseline for relative comparisons.
Here is the relative data for Quake 2 (with RPi = 1.0):
For this mixed floating-point/integer benchmark, the RPi 2 is roughly twice as fast as RPi. An older desktop platform (Core2 E8600) is 16 times faster if running AMD64 code. A recent desktop (i3 3220) is 23 times faster.
In general, AMD64 builds are faster than x86 builds - approximately 25% faster for FP/Integer code, and 10% faster for integer-only code.
The data for Doom shows similar results for integer-only code: