Review: UndoDB, a reversible debugger

Sunday, 3 May 2015

This is a review of a reversible debugger named UndoDB. It's a debugger for Linux, with GDB compatibility, which means it is a drop-in replacement for debugger frontends on the Linux platform (e.g. Eclipse, DDD, and also the IDE I use at work, which is GNAT Programming Studio).

Like recent versions of GDB, it is able to "run" code backwards, rewinding through whatever the program did. But unlike GDB's reverse debugging mode, it is fast and usable. GDB implements the reverse feature by recording the result of each instruction, whereas UndoDB reconstructs earlier state by using snapshots and the stored results of system calls.

As this review is positive, I must say that it is an independent review. I have no connection to the company making the tool beyond being a customer.

I am always interested in improved development tools. Though I can work with "traditional" tools such as vim, I still appreciate the value of cool new things that speed up development, like the Intellisense-like features provided by IDEs, or the invaluable error-detection capabilities of Valgrind, or powerful profiling technologies like perf, particularly when coupled with nice visualisations like Flame Graphs. For many years now, Valgrind has been one of the "killer apps" that makes a complete switch to Windows totally unthinkable. And now, with UndoDB, there is another.

I first saw a reversible debugger at a trade show. Based on Jakob Engblom's history of reversible debuggers, I think it must have been Green Hills' Time Machine. I immediately wanted one that I could use on my own PC in place of GDB. As a result, when reversible debugging was added to GDB, I was keen to try it out - but was quickly disappointed by the poor performance.

Reversible debugging helps me at work, but I can't go into detail about that, so I'll instead explain how I used UndoDB's 30 day free trial on my home computer. As a hobby project, I have been working on a benchmark based on Quake 2; earlier posts describe it. During this work I hit a number of issues related to floating-point numbers. For instance, a debug (-O0) build would produce one result for a certain floating-point function, and a release (-O2) build would produce a different result. The difference could be seen in the 3D graphics rendered by the game. Understanding and resolving these problems was the main issue in developing the benchmark.

I could only work on it in my spare time, and I don't have much of that. So I needed a way to quickly go from an observed difference in some rendered pixel back to the cause of that difference.

It is possible to do this with a traditional debugger, but the process is tiresome. You can use a watchpoint to find where a certain variable is assigned, but if it's assigned many times, then you have to find the relevant assignment within a haystack of possibilities. So you have to use counters and conditional breakpoints, and each time you want to move backwards, you have to add new counters and new breakpoints and rerun the program. It is error-prone and slow. If only the debugger could automate it!

That's where a reversible debugger helps. You can set breakpoints and watchpoints in the past, and then rewind to them.

It seems this works for any Linux application; it doesn't have to be compiled in a special way, and you don't have to modify the Linux kernel - or even run anything as root.

I think the best way to illustrate the potential is using a transcript of a debugging session. In the following example, I am trying to find why a debug build of Quake 2 using the GNU math library (libm) has produced a different output to a debug build using the openlibm math library. The code is the same; all that differs is that library. The session shown here uses GNU libm. In parallel, I ran another session using openlibm, and entered the same commands to look for differences.

I begin at the point where a difference between the openlibm and GNU libm versions of Quake 2 is detected by some test code. The difference is seen in the rendered video output. The test code found a difference at pixel number 3559:

    (undodb-gdb) print pos
    $2 = 3559       // location of the difference in the video buffer
    (undodb-gdb) print curframe
    $3 = 114        // frame number that is different
    (undodb-gdb) print vid.buffer[3559]
    $4 = 30 '\036'  // Here's the value of that pixel with GNU libm
                    // The value is 159 with openlibm

Now I find the location of that different byte in memory, set a watchpoint at that location, and then run backwards to find where the value came from.

    (undodb-gdb) print vid.buffer[3559]
    $5 = (pixel_t *) 0x20c7987 "\036...  // location of the difference
    (undodb-gdb) watch *(char*) 0x20c7987
    Hardware watchpoint 2: *(char*) 0x20c7987
    (undodb-gdb) bcont                   // Run backwards with "bcont"

Execution stops at the point where it was last written. I look at the expression that generated the value, to see where it got its inputs from.

    Hardware watchpoint 2: *(char*) 0x20c7987

    Old value = 30 '\036' // undodb stops at the point
    New value = 46 '.'    // where the byte is written
    0x00007fb100377826 in D_DrawSpans16 (pspan=0x7ffd0334a030) at
        /home/jwhitham/Quake-2/headless_quake2/linux/../ref_soft/r_scan.c:514
    514 *pdest++ = *(pbase + (s >> 16) + (t >> 16) * cachewidth);
    (undodb-gdb) print s
    $6 = 4915200 // Different! With openlibm I get 4915199
    (undodb-gdb) print t
    $7 = 1188545 // Also different. With openlibm I get 1188546

Having found two differences, I look through the source code to see how s and t are calculated. I find that s is derived from a global variable, d_sdivzstepu. I set a watchpoint to find out where that value came from, and again, I run backwards:

    (undodb-gdb) print d_sdivzstepu
    $10 = 0.00160917651 // Different. With openlibm I get 0.00160917663
    (undodb-gdb) delete
    Delete all breakpoints? (y or n) y
    (undodb-gdb) watch d_sdivzstepu
    Hardware watchpoint 4: d_sdivzstepu
    (undodb-gdb) bcont // run backwards (2nd time)
    Hardware watchpoint 4: d_sdivzstepu

    Old value = 0.00160917651 // Here's where d_sdivzstepu was assigned
    New value = 0.00191315648
    0x00007fb100361fa6 in D_CalcGradients (pface=0x7fb0ff0cd204) at
        /home/jwhitham/Quake-2/headless_quake2/linux/../ref_soft/r_edge.c:817
    817 d_sdivzstepu = p_saxis[0] * t;

Once again I look at the inputs to the expression, looking for a difference between the two runs:

    (undodb-gdb) print t
    $16 = 0.00249999994           // This is the same
    (undodb-gdb) print p_saxis[0]
    $17 = 0.643670619             // With openlibm I get 0.643670678
    (undodb-gdb) delete
    Delete all breakpoints? (y or n) y
    (undodb-gdb) watch p_saxis[0] // Run backwards again to find why...
    Hardware watchpoint 5: p_saxis[0]
    (undodb-gdb) bcont            // 3rd call to bcont
    Hardware watchpoint 5: p_saxis[0]
    
    Old value = 0.643670619       // Here's the source of p_saxis[0]
    New value = 5.30636976e-37
    
    0x00007fb100369025 in TransformVector (in=0x7fb0ff01d724, out=0x7ffd033490c0)
             at /home/jwhitham/Quake-2/headless_quake2/linux/../ref_soft/r_misc.c:219
    219 out[0] = DotProduct(in,vright);
    (undodb-gdb) print in[0]
    $18 = 1                       // Look at the inputs:
    (undodb-gdb) print in[1]      // in[0], in[1], in[2] are same...
    $19 = 0
    (undodb-gdb) print in[2]
    $20 = 0
    (undodb-gdb) print vright     // but vright is different.
    // with GNU libm {0.643670678, 0.765262604, 0.00783333182}
    $21 = {0.643670619, 0.765262604, 0.00783333182} // with openlibm

vright[0] is different, so yet again, I run backwards to find out why:

    (undodb-gdb) delete
    Delete all breakpoints? (y or n) y
    (undodb-gdb) watch vright[0]
    Hardware watchpoint 6: vright[0]
    (undodb-gdb) bcont      // 4th time
    Hardware watchpoint 6: vright[0]
    
    Old value = 0.643670619
    New value = 0.643877447
    0x00007fb100379afc in AngleVectors (angles=0x7fb1005cb4c4, forward=0x7fb1005b7ad8,
             right=0x7fb1005cadb4, up=0x7fb1005cb418) at 
             /home/jwhitham/Quake-2/headless_quake2/linux/../game/q_shared.c:117
    117 right[0] = (-1*sr*sp*cy+-1*cr*-sy);
    (undodb-gdb) print sr
    $22 = -0.00785390008 // same as openlibm
    (undodb-gdb) print sp
    $23 = 0.0723257735 // same as openlibm
    (undodb-gdb) print cy
    $24 = -0.764920235 // same as openlibm
    (undodb-gdb) print cr
    $25 = 0.999969184 // same as openlibm
    (undodb-gdb) print sy
    $26 = 0.644124985 // different! openlibm had 0.644125044
    (undodb-gdb) delete
    Delete all breakpoints? (y or n) y
    (undodb-gdb) watch sy
    Hardware watchpoint 7: sy
    (undodb-gdb) bcont // run backwards again (5th time)
    Hardware watchpoint 7: sy
    
    Old value = 0.644124985
    New value = 0
    0x00007fb10037995a in AngleVectors (angles=0x7fb1005cb4c4, forward=0x7fb1005b7ad8,
             right=0x7fb1005cadb4, up=0x7fb1005cb418) at
             /home/jwhitham/Quake-2/headless_quake2/linux/../game/q_shared.c:100
    100 sy = sinf(angle);
    (undodb-gdb) print angle
    $27 = 2.44171381 // same as openlibm

And here's the reason for the difference. The "sinf" function is implemented differently in the two math libraries. With GNU libm, sinf(2.44171381) = 0.644124985. With openlibm, sinf(2.44171381) = 0.644125044. It's a difference in rounding - and though it is a very small difference, it still affects the output drawn on the screen. This is why I bundled openlibm with my Quake 2 benchmark. Both answers are valid - the difference is only in the least-significant bit of the floating-point number returned by sinf - but I need the behaviour of the program to be completely consistent, and unaffected by system library versions.

As the debugger transcript shows, in order to go from a visibly different pixel to the subtly different output from sinf, I had to run backwards five times. It would have taken at least five runs of Quake 2, and probably more, to find the source of the difference by hand. I would have needed to add counters to the program in order to stop in the correct places, since all of this code is executed repeatedly during each frame. The process would be error-prone and slow. I would need to write plenty of notes in order to avoid becoming lost in the program. I would expect the whole process to take several hours, assuming no mistakes are made.

As it is, I was able to do all of this in a few minutes.

Since I've been building a benchmark program, I will add something about tool performance.

On an Intel i5-2500 PC, when compiled for AMD64 with GCC 4.7.2 and -O2, the Quake 2 benchmark running time is 20.3s, or 1.19ms per frame*. If I run the benchmark inside UndoDB (version 4.1.3840), the running time increases to 106.9s, or 6.29ms per frame. This is a slowdown, but put it into perspective by comparing with other development tools! For instance, valgrind (version 3.7) takes 590.4s (34.7ms per frame). And with GDB version 7.6, when running in the "record" mode required for reversible debugging, the same PC will take an astonishing 34 minutesto record a single frame. The overall running time will be approximately 400 days.

This great difference in performance does come with a caveat. As you'd expect, UndoDB's performance when running forwards comes at the cost of speed when running backwards, and a search for an event that occurred a long time ago can be rather slow. But I am talking about slowness on the "valgrind" scale rather than the "GDB" scale. You may have to wait a minute; you won't have to wait a day. As with forwards-only debugging, you can speed up the search by avoiding conditional breakpoints.

I use UndoDB at work, and I have considered buying a licence for use at home, but at this time, it's sold only as a tool for business use and the licencing fees reflect that. However, at work, the tool has already paid for itself. If you write software for a living, and you can run that software on Linux, then it's worth having a look at the free trial.

Footnotes

* Overall running times are a median of 19 executions. Frame timings are overall running times divided by 17000 frames.