Profiling versus tracing

Sunday, 28 February 2016

Profiling and tracing may be used to analyse the dynamic behaviour of a program. A profile (or trace) reveals performance issues, telling you which parts of a program require the most time. You can then focus your efforts on investigating those parts and improving them. If your program takes longer than you expect, then profiling is one of the first resorts to understand the issue.

Profilers generally operate by sampling ("statistical profiling"). Your program is repeatedly interrupted as it runs, with a fixed interval between interruptions (e.g. every millisecond). The purpose of each interruption is to take a sample, which is done by visiting each running thread, and then examining the stack to discover which functions are running. The effect is similar to stopping execution in a debugger and asking for a backtrace, e.g. using the "bt" command in GDB. The profiler records each sample and all of the samples are aggregated into a report or turned into a graph: flame graphs are a nice example of that. On Linux platforms, the best profiler is perf: do not waste time with the archaic gprof tool.

Tracers do not operate by sampling. Instead, the trace is a log of events within to the program. This log may be detailed enough to report function calls and returns, and execution of other statements. Tracing may require the program to be instrumented, i.e. modified to include code to log events. This instrumentation may be added to the source code, creating a "instrumented code" as a pre-compilation step, or it may be added dynamically to the machine code (Intel's PIN tool works like that). Alternatively, tracing may rely on hardware such as the branch trace store in recent Intel CPUs, or the trace may be generated by an instruction set simulator. There are also less detailed methods of tracing code - for instance, strace or Process Monitor - which only record certain types of event, e.g. system calls.

A detailed trace may be used to reconstruct a profile. As the trace is a timestamped record of calls and returns, a trace parsing tool can track the state of the stack at each point in time. If the state of the stack is sampled periodically, the result is the same as a profile.

However, a profile cannot be used to reconstruct a trace. This is because the profile omits information about everything that happened between two samples. Many calls and returns may occur between samples: these are invisible in the profile report. In many cases, this missing information is not necessary in order to see where the program is spending most of its time, and it's possible to understand the performance problem without it.

Traces are necessary for other forms of timing analysis. Sometimes it isn't enough to rely on sampling to obtain the timing information required, because you need all of the information, not just the cases that are most likely to occur. Worst-case execution time (WCET) analysis is one example: this analysis indicates the longest possible execution time required for part of the program. WCET analysis requires very detailed information about the program and the hardware it runs on, usually in order to certify a safety-critical system such as engine control software for an aircraft. A profile is neither reliable enough nor detailed enough for this. But that sort of analysis is only required for embedded real-time software.

Traces can also be useful for discovering the chain of events that led to a problem: in this sense, a tracing tool is a bit like a reversible debugger such as Undodb or RR, but with the advantage of running on platforms that reversible debugging can't support, like exotic embedded systems hardware or Microsoft Windows. Traces don't let you see absolutely all information about the program state, but you can at least see the path taken to reach a problem, which may provide enough clues. (I have used this debugging method on Windows for a program that crashed during startup on some systems and not others, and could not be debugged in any other way.)

The unfortunate fact is that tracing is often more difficult to set up, compared to profiling, so in many cases it does not make sense to use a tracing method when you can use a profiler. Here are some of the problems:

  • Compiling instrumenting code is never quite as easy as compiling the original code,
  • Detailed traces are very large,
  • The overhead of writing detailed trace information to disk may well be greater than the execution time of your program,
  • If any information is missing from the trace, it may be impossible to accurately parse it.

This last point requires a little extra information. If the trace is generated by instrumenting the code, you run into a problem with code that can't be instrumented, such as code that's already been compiled into the C library. Ignoring calls to uninstrumented code is risky, because uninstrumented code may call back into instrumented code. For instance, the C library function "qsort" is passed a function pointer for a comparison function. If "qsort" is not instrumented, but the comparison function is, then parsing the trace poses a problem: the transition from the call to qsort to the comparison is unexpected, and the trace parser cannot know whether the trace is a correct representation of program execution or not. Signal handlers and interrupt handlers are also an issue for trace parsing for a similar reason.

There are ways of dealing with all of these problems, and they're pretty straightforward, e.g. marking the functions that call back, indicating which functions represent signal handlers, or excluding certain functions from instrumentation. But this is extra effort for the programmer, not necessary to set up statistical profiling.

Tracing can be useful, but there are additional difficulties when it is used. In many cases a profile will be sufficient for analysing the performance of a program, understanding which parts are slow, and figuring out how to resolve bottlenecks, and these are the problems of interest to most programmers. However, statistical profiles are not suitable for certain problems, such as those in safety-critical systems, and in those cases, tracing may be required.