Saturday, 25 October 2014

It's hard to test software: even simple software!

Tetris is one of the best-known computer games ever made. It's easy to play but hard to master, and it's based on a NP-hard problem.

But that's not all that's difficult about it. Though it's a simple game that can be implemented in one line of BBC BASIC, it's complex enough to be really hard to thoroughly test.

Ideally, a game tester has to try every possible action, in order to be sure that the game works correctly whatever the player does. But even in a simple game, there is so much to test!

Recently my employer Rapita Systems released a tool demo in the form of a modified game of Tetris. Unlike "normal" Tetris, the goal is not to get a high score by clearing blocks, but rather to get a high code coverage score. To get the perfect score, you have to cause every part of the game's source code to execute. When a statement or a function executes during a test, we say it is "covered" by that test.
Trying to test the game of Tetris is not as easy as it seems.

About 70% of the code statements within this version of Tetris run all the time. Play for a few seconds, and you've tested most of the code. It's the remaining 30% that's hard to find.

Unless you use a coverage tool, you may not even be aware that this code hasn't been tested. You'll know that some parts probably haven't been tested very well - that's common sense. But until the coverage tool tells you, you don't know.

With a coverage tool, you can solve the puzzle. It tells you where coverage is missing, so you can figure out what more is needed. It's not an easy puzzle. It combines programming and debugging skills with knowledge of the game.

A few minutes playing this Tetris should be enough to convince you that it's non-trivial to get full coverage. Intuitively, you'd expect to get it just by playing for long enough, but you'd have to be very lucky, because some of the cases you need to test are extremely rare.

For example, here's one case you're unlikely to hit by accident. Here is part of the coverage report (produced when you press F4):

This procedure, updateScore4, is called when a block of 4 lines are cleared at once. Your score increases proportionally to the game's level (line 325). That's not uncommon, but there's something else here. If you clear 4 lines twice in a row, then the statement on line 327 is also activated. Your score is increased by a big bonus. It's a combo. There are equivalent procedures for 3 lines, 2 lines and 1 line.

This is not the only unusual special case in Tetris. For instance, there are special cases for rotating a block when it's next to a wall ("wall kick") or next to other blocks.

Full coverage means executing all of these cases.

If you're testing a game, you may have played it for hours, days, or even weeks, and still not have covered all of the unusual, rarely-reached special cases. What if line 327 has a critical bug that crashes the game, or resets the score? Perhaps one player in a hundred will occasionally notice it, and may not even know what happened.

The coverage report tells you what's left to test, so it leads you straight there.

At Rapita, we were thinking about nice ways to show software developers why code coverage is a hard problem that requires good tools. We thought it had to be a game of some sort, preferably a highly recognisable and well-known game that anyone could pick up and play. We ran through some possibilities, and Tetris was chosen because it's simple, very well-known, and yet it's very hard to test.

Coverage testing is not usually applied to games. Our customers tend to be makers of aircraft or car parts. Both businesses have strict safety standards which involve coverage testing, and our tools help you produce the relevant reports for certification, like DO-178B for the aviation industry.

However, all software can benefit from coverage testing, safety critical or not. How else do you know what remains to be tested? How else do you know where bugs may be lurking, in rarely-used code? As a programmer I would like to think my code is bug free, but I know it often isn't, and I want to know about bugs now, rather than next week.

There are quick-and-dirty ways to tell if particular functions or statements are covered. If you are a programmer, you can probably think of a few. But a real debugger is preferable to filling your program with "printf" statements; and likewise, a real coverage tool is preferable to filling a program with other ways to detect coverage, such as breakpoints and log messages ("Tested function main()!"). A proper coverage tool saves you lots of time, and also does the sorts of coverage that are non-trivial to implement such as MC/DC.

I had thought that it would be easy to get complete coverage of Tetris. I was quite wrong. The example demonstrates an old lesson about programming: simplicity is an illusion, and you typically spend more time debugging and testing than actually writing code. Try it out for yourself! Download the demo here, and tell us if you get a perfect score.