Statement coverage for Python: design and analysis

,

1. Introduction

This document lists the requirements for a statement coverage tool for Python, describes some issues in design and implementation, and compares coverage.py with other statement coverage implementations.

2. Requirements

  1. You can run many tests and perform coverage analysis based on all the tests.

  2. You can get a summary report showing coverage for a set of modules and the total.

  3. You can annotate Python source code to show which statements are covered.

  4. The coverage analysis deals correctly with all Python features.

  5. You can do coverage analysis while testing interactively.

  6. Recording of coverage information doesn’t slow down a test case more than necessary.

  7. You can do coverage analysis for multi-threaded programs.

  8. The coverage tool is portable to all Python versions from 1.5.2 onwards and all operating systems where Python runs.

3. Design

3.1. Interfaces

Requirement 1 means that coverage information needs to be accumulated in a file during a sequence of tests. See [GDR 2001-12-04, 2] for the command-line interface which achieves this. Requirement 5 means that there needs to be a documented programmatic interface. See [GDR 2001-12-04, 3] for the documented interface.

3.2. Python language features

To meet requirement 4 we need to know which source lines represent statements. Looking for non-blank, non-comment lines isn’t good enough because some statements extend across many lines, but only the first line will appear in Python as a source line number. So we use the Python parser [van Rossum 2001-07-20, 17.1] to parse the module sources and walk the source tree looking for the first line of each statement (when tree[0] == symbol.stmt we descend tree[1] until we get to a terminal token, whose line we record).

Code on the second and subsequent lines of multi-line simple statements is reported by the tracing interface as appearing on the first line of the statement, so recording the first line of each statement captures all the executed lines except for elif, except and finally lines in compound statements. So we record the lines with these tokens separately. See the find_statements() method.

No execution takes place on a line containing only the else token of if, while and try compound statements, so we don’t record lines containing only else. But when we annotate a listing, such a line should be marked as covered if and only if the following statement is covered. So the annotate() method has special logic for this case.

3.3. Filenames

The filename in a Python code object rarely matches the __file__ attribute of the module to which the code belongs. There are three kinds of difference:

  1. module.__file__ is the compiled byte code (.pyc), but the filename for the code is the source file. We work around this by turning .pyc to .py when we find it.

  2. The file names differ as to directory, for example we may have:

    module.__file__ == '/dev/project/foo/module.py'
    but:
    code.co_filename == 'module.py'
    We work around this by searching sys.path to find the file.

  3. The code might have been compiled somewhere temporary, for example:

    module.__file__ == '/usr/lib/python1.5/getopt.py'
    but:
    code.co_filename == '/var/tmp/python/usr/lib/python1.5/getopt.py'
    We work around this as a last resort by stripping the directory part and then looking for the file in sys.path.

See the canonical_filename() method.

3.4. Threads

How should the coverage module handle programs with multiple threads (requirement 7)? When run from the command line as

python coverage.py -x script.py
we must collect coverage data in all threads. When testing interactively, (at least) three approaches make sense:

  1. Each thread can turn collection on and off independently.

  2. All threads started from a thread inherit the collection status of that thread at the point where the thread is started.

  3. Any thread can turn collection on or off for all threads.

I think it’s unlikely that testers will need fine control of when collection takes place. Most of the time you just turn collection on to begin with and collect everything thereafter. The second and third approaches are suitable for this. Any implementation of the third approach penalises execution speed when collection isn’t taking place. So the second approach is best of these three.

4. Comparison with other implementations

I looked at two other statement coverage testing tools for Python: trace.py [Dalke 1999] and pycover 0.2 [Csillag 1999-07-01]. Neither met all my requirements. Both have these problems:

trace.py also has these problems:

Table 1. Performance of coverage implementations
Test Execution time (s)
No coverage coverage.py trace.py pycover.py
10 × test_message.py 15 59 120 65
test_xhtml.py 21 180 306 184

A. References

[Csillag 1999-07-01] pycover 0.2”; Andrew Csillag; .
[Dalke 1999] trace.py”; Andrew Dalke; 1999.
[GDR 2001-12-04] Statement coverage for Python”; Gareth Rees; Ravenbrook Limited; .
[van Rossum 2001-07-20] Python Reference Manual (release 2.1.1)”; Guido van Rossum; .