Can we ship yet? Using Perforce fixes to measure product quality

,

1. Introduction

This paper explains how to use the Perforce software configuration management software [Perforce] to efficiently measure the quality of a product in a software development environment where there are many branches, customers, and product versions.

Gareth Rees <gdr@ravenbrook.com> works as a consultant at Ravenbrook Limited <http://www.ravenbrook.com/>, an independent software engineering consultancy company, specializing in memory management, software process improvement, and software configuration management.

2. The problem

In a development environment where there are many product versions being managed using configuration management branches it is necessary to track defects from discovery to resolution in all the branches.

Figure 1 illustrates the software configuration model that I’ll be using to give the examples in this paper. The master branch for the product is used for developing features that are common to all customers. For each customer there’s a development branch where features specific to that customer are developed. When all the features for a release to a customer are present, a release branch is made. Release candidates and releases are made from the release branch; typically the only changes on a release branch are resolutions for defects.

Figure 1Figure 1. Organization of branches for two customers

Any organization needs to know which defects are known to be present in each release. The release manager needs this information to assess the quality of the release before making it public (hence the title of this paper). Quality assurance need to know which defects are thought to be resolved so that they can direct their testing resources to the most productive areas. Support need to know which defects are thought to be present in each release so that they give the best advice to customers.

Table 1 shows the form this knowledge takes. Each defect is either present or absent in each release.

Table 1. Which defects are present in which release?
Defect Release
A.1.1 A.1.2 A.2.1 A.2.2 B.1.1 B.1.2 B.2.1 B.2.2
1 Yes Yes Yes No Yes Yes Yes Yes
2 No No Yes Yes Yes Yes Yes No
3 Yes Yes No No Yes Yes Yes No
etc.

Most defect trackers have a substantially simpler model for the relationship between defects and releases: typically they record the release in which the defect was introduced, the releases in which the defect was observed, and the release in which the defect was resolved. This model applies only when releases are made in sequence from a single set of sources.

The only reliable way to generate the information in this table is to test each release for the presence of each defect. But the testing effort required to do this is proportional to the product of the number of releases and the number of defects. This cost is prohibitive.

This paper will describe how to build and maintain reasonable approximations to the information in table 1 using testing effort proportional to the number of defects.

3. Simple use of fixes

The simplest approach requires you to adopt three procedures:

  1. To record the set of revisions of files that are used to build each release, either using a label, or more simply by using a combination of filespec and changelevel. For example, release A.1.1 would be built from the files given by a specification like //...@rel-A.1.1 (if using labels), or from a specification like //depot/spong/rel/A.1/...@12345 (if using changelevels).

  2. To represent each defect in your product as a job in Perforce. This can be done by using Perforce’s job feature as the only or main defect tracking system, or by using the Perforce Defect Tracking Integration [P4DTI] to represent each defect from your defect tracking system as a job in Perforce.

  3. To make a Perforce fix linking each job with the changelist that resolves it. For example, p4 fix -c 12345 job000123.

Now we can compute the relationship between releases and defects if we make the simplifying assumption that every defect has been present since the beginning of the project; that is, a defect is present in a set of sources if there is no fix for the defect in that set of sources. Then we can compute the set of defects that are absent in release A.1.1 by running a command like

p4 fixes -i //depot/spong/rel/A.1/...@12345

The set of defects present in a release is just the complement of the set of defects absent in the release.

4. Examples

Figures 2 and 3 give two examples of how this works in practice. Figure 2 shows a defect being resolved on a release branch. The changelist that resolves the defect is integrated back to the master sources, from which it is propagated forward to the development branches and then to new release branches. The green lines show sources for which the fix is included in the output of p4 fix -i //sources and hence in which the defect is absent; black lines show sources in which the defect is present.

Figure 2Figure 2. Propagation of a fix

Figure 3 shows a defect which needs two different resolutions. A quick hack is made on the release branch so that a patched release can be shipped to customers. A more considered resolution for the same defect is made in the master sources and is eventually propagated to new release branches. (This is of course one of the reasons for having a series of release branches in the first place; hasty changes made on release branches don’t later need to be removed when you make a better-engineering resolution [Brooksby 1999-05-20, 9].)

Figure 3Figure 3. Two fixes for a defect

For the Perforce Defect Tracking Integration Project [P4DTI], we developed a number of approaches for querying the fix database [P4DTI issues 1]:

  1. Defects present in a release.

  2. Defects absent from a release.

  3. Defects newly resolved in a release. (That is, absent in the release but present in the previous release.)

  4. Defects resolved between any two releases or release candidates.

  5. Defects resolved on a branch at a changelevel since the previous release.

  6. Graphs of defects by date (see section 8 for an example).

All of these queries are helpful in assessing the quality of the product sources and deciding whether to release.

5. Limitations

5.1. Accuracy of fixes

The results of the computation outlined in section 3 are only as good as the fixes database itself. A fix only indicates that someone believes that the changelist resolves the problem, not that it is really resolved. This has two implications:

  1. Developers need to be conscientious about recording the defects they believe they’ve resolved.

    In practice I don’t think this will be much of a barrier. If you’re using the Perforce jobs system as your defect tracker, or if you’re using the Perforce Defect Tracking Integration [P4DTI] then it may be easiest for developers to record their work using fixes. (Using fixes, you issue the fix using the same interface that you submit your change; using a defect tracker, you usually have to switch interfaces.)

  2. It is vital that you have some process that assures the quality of the fixes database.

    Most organizations already have an SQA process that verifies that defects are resolved when developers claim that they are. It is straightforward to extend this to correcting the fixes database when a mistake is discovered.

When a mistake is discovered it is not a good idea simply to delete the fix. After all, it records the important information that the change was intended to resolve a defect. It’s better to keep the relationship, but change its status. For example, you might use the fix status “open” to indicate that a change affects a defect but leaves the defect open:

p4 fix -s open -c 12345 job000123

If you do this, then you’ll need to adjust the algorithm in section 3 so that it only takes account of fixes with status “closed”.

5.2. Need for precise changes

This method relies on being able to identify a changelist that resolves each defect. It doesn’t fail if a defect takes a number of changelists to fix it—you can make a fix with status “closed” between the defect and the last changelist in the sequence (the other changelists in the sequence can be linked to the defect by fixes with status “open”, as in the case where a proposed fix was found to be incorrect, discussed in section 5.1). However, if you often use multiple changelists then it does make the method potentially less accurate when you integrate defect resolutions between branches, because you might integrate the final change (the one with status “closed”) but miss one or more of the other changes that are essential to make the resolution work.

This means that some development methods won’t give you accurate results. In particular, working on several problems at the same time in the same set of sources will probably be ineffective.

One approach that works is to make a branch for each defect resolution (or for each resolution that you believe will be complex enough to require several changes). Then you can fix the defect with the reverse integration at the end of the branch.

This discipline can be relaxed when you’re working on new features rather than fixing defects. But you would lose the benefits of being able to track feature introduction as well as defect resolution using this method.

6. Benefits

Disciplined use of Perforce fixes has benefits beyond efficient computation of the relationship between releases and defects.

  1. You can check that change management policies (such as that in [Brooksby 1999-05-20]) are being followed. For example, you might have a policy which says that the only changes that are allowed on release branches are changes which resolve defects of a certain severity or higher. This is easy to check if you’re using fixes, and straightforward to enforce using a trigger in the Perforce server.

  2. Developers can get an answer to the question “Why was this change made?” by discovering which defect it resolves.

  3. SQA can check that a defect was resolved properly by inspecting the change that resolved it.

  4. When you need to merge a defect resolution from one branch to another, you can reliably find the changelist which implemented that resolution.

  5. The discipline encourages developers to separate each resolution into its own changelist, making merging simpler.

  6. You can automatically generate parts of the release notes: the defects newly fixed in a release are those absent from the release but present in the previous release.

  7. You can quickly evaluate the quality of any set of sources: the algorithm doesn’t need a release to be made. This helps the build manager to decide when to make release candidates.

  8. SQA can determine exactly which defects are thought to be resolved in a particular release candidate, and so direct their regression testing efforts efficiently. (In some organizations SQA may only know that a resolution is planned for a release, but not know which candidate has the resolution: this leads to wasted testing effort.)

7. Recording defect introduction

Figure 4 shows the resolution of a defect that is specific to a feature: the defect only affects the releases for customer A. The algorithm in section 3 gives incorrect results for this defect. It claims that the defect is present in the releases for customer B, but it isn’t.

In cases like this, although the algorithm gives the wrong answer, it does give a conservative approximation. So this problem won’t lead the release manager to make a release of poor quality.

Figure 4Figure 4. A feature-specific defect

A number of extensions to the algorithm can be used to get more accurate results for feature-specific defects. For example, one could record for each defect which customers are affected, and take that into account when computing the defects present in a release to that customer. Or you could record for each defect the first release in which the feature affected by the defect was supported.

However, the most flexible extension is to record the changelist that introduced the defect, by making a fix with status “introduced”. You can record that a defect is specific to a customer by linking the defect to the change that created the development branch for that customer, as shown in figure 5. You can record that a defect is specific to a feature by linking the defect to the change that introduced the feature (or perhaps the change to the documentation that indicated that the feature was supported).

To use this information, we can modify the algorithm from section 3 to work like this:

  1. To compute the set of defects that are absent in release A.1.1, run a command like

    p4 fixes -i //depot/spong/rel/A.1/...@12345
  2. A defect is absent from the release if:

    1. There is a fix for that defect with status “closed” in the sources contributing to the release (that is, in the output of the p4 fixes command above); or

    2. There is some fix for that defect with status “introduced”, but no such fix in the sources contributing to the release.

  3. The set of defects present in a release is the complement of the set of defects absent in the release.

Figure 5 shows this in operation. The defect is identified as being specific to customer A, so it is fixed with status “introduced” at the start of the development branch. The revised algorithm gives the correct results for each release.

Figure 5Figure 5. Recording when a defect is introduced

You could go further than I suggest here and try to work out for each defect exactly which changelist introduced the defect. Most defects are present from the start of a project or from the point where a feature was introduced, but some defects are regression errors. In principle, this analysis should be cheap in many cases: the developer who fixes the defect has to identify the piece of code which must be changed to resolve the defect; all they then need to do is trace that piece of code backwards through the file’s revision history until they can see where it was introduced. In practice it’s tedious because of the lack of a tool for browsing through the revision history of a file (remember that this has to take into account all the branches on which the file has been edited). The approximations I suggest above are certainly good enough for almost all uses.

8. Is the product getting better or worse?

Mike Angelo argued in MozillaQuest magazine [Angelo 2001-08-25] that the quality of the Mozilla web browser was getting worse. In support, he showed the graph in figure 6. This graph shows the open issues in Mozilla over time; the total number of open issues is the sum of the three lines. The graph is generated directly from the Mozilla project’s defect tracking system [Mozilla]. Angelo argued as follows:

A particularly troubling indication in [figure 6] is the sharp increase in the New bugs curve that starts about one year ago. That seems to be the point at which the new bugs problem substantially accelerates. Mozilla’s bug problems have gotten worse, not better.

That’s not to say that the Mozilla folks have not fixed lots of bugs. They have fixed lots of bugs. However, bugs are cropping up faster than bugs are being fixed. […]

Nor is that not to say that the Mozilla folks have not added many new features and improvements to their Mozilla browser suite. They have added many new features and improvements. However, in doing that they added a tremendous number of bugs to Mozilla.

Figure 6Figure 6. Open issues in Mozilla on 2001-09-23

Angelo’s analysis is incomplete and possibly incorrect, because he doesn’t realize that there are two possible causes for the observed increase in the number of open defects:

  1. The number of defects in the product is going up.

  2. The rate of defect discovery is going up (as the Mozilla browser user base increases and as their use of the product becomes more sophisticated).

If the Mozilla project were to work out, for each defect, a reasonable approximation to when the defect was introduced, as described in section 7, then it would be possible to disentangle the relative contributions of the two causes, and so to estimate whether Mozilla is getting better or worse.

(One source of inaccuracy in this estimate is that defects take time to be discovered. So for example it may be the case that 40% of the defects introduced in release 1 have been discovered but only 20% of the defects introduced in release 2: this would make the proportion of defects introduced in release 2 seem too low and so give an unduly positive picture of product quality. This bias could be corrected, given enough evidence about the time it takes defects to be discovered, but that’s beyond the scope of this paper.)

In the Perforce Defect Tracking Integration Project [P4DTI] we approximate the introduction of each defect by identifying the first release to contain the defect. This allows us to draw the magenta line in figure 7. This line goes up when a defect is introduced, and down when a defect is fixed. It thus records our current knowledge about the number of defects present in the product on each date. In the case of the P4DTI, the majority of the issues were present in the product at the start, and relatively few were introduced with each release. So we believe that the quality of the product is going up.

Figure 7Figure 7. Issues in the P4DTI on

9. Conclusion

A software development organization can use Perforce fixes to get efficient approximations to the quality of their product at every stage of development, especially when there are many development branches, many customers, and many releases.

The Perforce Defect Tracking Integration project [P4DTI] demonstrates how the techniques described in this paper can be put to good use.

A. References

[Angelo 2001-08-25] Mozilla roadmap update”; Mike Angelo; MozillaQuest; .
[Brooksby 1999-05-20] Product Quality through Change Management”; Richard Brooksby; Geodesic Systems; .
[Mozilla] Open issues in Mozilla”; Mozilla project.
[P4DTI] Perforce Defect Tracking Integration”; Perforce.
[P4DTI issues 1] P4DTI issues”; Ravenbrook.
[P4DTI issues 2] Graph of P4DTI issues by date”; Ravenbrook.
[Perforce] Perforce Software Configuration Management”; Perforce.

B. Document History

GDR Created.
GDR Fixed defects discovered by NDL and NB.