What should Mozilla look for in an automated review system?

In last week's All Hands when talking about possible process improvements, Mike Shaver asked if there was any interest in some sort of automated review system.  Google's Mondrian is probably the most well known review system.  Review Board seems to be the best known open source review system.  I've never used Review Board, but as an ex-Googler, I have used Mondrian.

I think Mozilla would really benefit from a good review system.  My review system experience is limited to Mondrian, which is, of course, proprietary to Google.  I don't know how Review Board and the other open source systems measure up, and, as I'm about to get to, I think Mondrian itself has some notable deficiencies.  My overall point is that if we decide to adopt a review system, we should choose carefully.  I'll try to offer some guidance about how to do that.

Quick Overview of Mondrian

Mondrian has a dashboard view for outstanding changelists but to me the essence of Mondrian is its UI for reviewing a single changelist, and that's what I'm going to talk about here.  At the highest level you have a list of files modified in the changelist.  You can click on each file and get a diff of that file against the baseline in a side-by-side view.  You can easily see which lines were added, modified, or deleted.  OK, everybody is familiar with side-by-side diffs these days.  Here's the cool part though: A reviewer can click on any line and leave an inline note or comment.  Afterwards, the reviewee can go through the diff and respond to the comments.  The way this often works in practice is the reviewer leaves an instruction in the comment and the reviewee responds by making the requested change and replying to the comment with "Done."  In fact, there's a "Done" button as well as a "Reply" button on the comment just to make this last part easy.

Once the reviewee has updated the changelist to the reviewer's satisfaction, the reviewer will approve the change and the reviewee can submit it to the source control system.

What's not to like?

Mondrian's review UI is great for simple line-by-line reviews, things like:
  • "You don't need to check for null here."
  • "You should check for an error here."
  • "Fix indentation" 
Sometimes this level of line-by-line scrutiny is really useful.  For example, exception-free C++ code often requires the programmer to check virtually every single function call for errors.  This is hard for even the best programmers to get 100% right.  But let's be clear here.  What Mondrian is really good for, all the time, is line-by-line nitpicking of code.  And frankly, line-by-line nitpicking of code is already pretty easy without some fancy tool like Mondrian to make it extra easy.

Mondrian's review comment system really seemed to encourage a style where there was a one-way flow of instructions from the reviewer to the reviewee: "Do this.  Do this.  Do this." and the reviewee replies with "Done.  Done.  Done."  Sometimes this is appropriate, but oftentimes it isn't.

Of course the reviewee always has the option of clicking "Reply" instead of "Done".  You could have a whole thread of comments and replies if you wanted to.  But given the limitations of the UI, that would be kind of like communicating with short messages written on post-it notes.  And not regular sized post-it notes either, but rather those extra tiny ones.

So Mondrian not only encouraged a review focus on individiual lines, it also tended to encourage a one-way flow of information from reviewer to reviewee which could easily degrade into a one-way flow of commands.

What would I want in a "good" review system?

It may seem like I'm arguing that a review system should actively discourage line-by-line review, and that's not actually the case.  I think that review style is often useful, and pretty much any review system, good or bad, will support it.

There are really two fundamental things that I do want out of a review system.
  • The system should go out of its way to support a bi-directional flow of information between the reviewer and the reviewee.  In the extreme, it should provide a means of carrying on an actual conversation.  This could be supported within the review system itself, but even a simple means of escalating a conversation to email (or even IRC) might be a big help.
  • The system should support higher level discussions about the code under review.  Actually I think it should go so far as to encourage this kind of information flow.  You can sort of do this with Mondrian, but you are usually better off just going to email.

Some general guidance

I've put a fair amount of thought into how I'd like an ideal review system to work.  Mostly I've been thinking in terms of concrete features, but that's of limited utility if what you want to do is choose between existing review systems rather than writing a review system yourself.

So I'm trying to figure out some general guidance.  I think what I'm trying to say in this post more than anything is that affordances matter.  Mondrian, for example, seems to afford the line-by-line scrutinizing, nitpicking approach to code reviews.  It also seems to afford a model where the reviewer simply gives instructions to the reviewee and the reviewee just automatically carries them out.

Mondrian offers some minimal affordance for discussing (as opposed to simply commenting on) a particular line of code, but it could do a lot more.  Notably it does not seem to offer any real affordance for discussion at the design level, which has always seemed to me like a serious omission.

A simple concrete example

Here's one simple way that Mondrian could be improved which I hope will illustrate my point about affordances.  As described above, Mondrian comments have two buttons, "Reply" and "Done".  It could offer other choices as well, so you might have:

"Reply", "Done", "Done w/comment", "Defend", etc.

These latter two functions could easily be done with "Reply", but if you give them their own buttons, you explicitly tell the reviewee that there are other options that can be considered here.  In particular, in this case, they give the reviewee permission to say "I had a reason for doing this the way I did, would you please consider my reasoning?".

My larger point here is that a review system's UI strongly shapes the code review process, and it can shape it in good ways or bad ways.  As a result, we want to think not just about what we want out of a code review system, but also what we want for the code review process itself.

Super-short Summary

  1. A good review system needs to support two-way information flow, and it should probably go so far as to actively encourage it.
  2. A good review system should support review at a higher level than simply line-by-line, and it should probably go so far as to actively encourage it.
  3. Affordances matter.

Notes

  • This post describes Mondrian as of about nine months ago when I last used it.  It may have received a major upgrade in that time, I don't know.  Mondrian may also have had useful features that I didn't know about -- not all of its features had good discoverability.
  • Probably the most common code review case at Google is a reviewer and reviewee who are peers, have known each other for months if not longer, and who sit near each other, often in the same room.  Mondrian's limitations are a lot less important in this scenario, since a lot of out-of-band communication can happen the old fashioned way, by talking face-to-face.
  • In some circumstances you can have reviewers and reviewees who work in different locations and who have never met in person.  This is not uncommon at Google, and it is in fact very common at Mozilla.  In this scenario, the way the review system works becomes much more important.  
  • The obvious way to get around Mondrian limitations is to fall back to email.  I ultimately started emailing concerned parties requesting informal reviews before requesting formal reviews through Mondrian.  Mondrian can still be used in this case -- it can see pretty much any outstanding changelist.  Nevertheless, by making an informal request in email, you could get a high-level review of a changelist without getting mired in low-level details.  And since this was through email rather than Mondrian comments, you could hold an actual conversation about it.  It turned out my team's superstar intern had independently started doing the same thing.  I jokingly referred to this as "doing a review before the review".
  • This post might lead you to believe that I'm a Mondrian-hater.  Actually, I think Mondrian is a very good tool, I just feel like they quit working on it before they were done.
7 responses
See also, the code review system Chromium and other open source projects from Google uses: http://codereview.chromium.org/
@samuelsidler:

Chromium is using "Rietveld" which is a rewrite of Mondrian to run on Google AppEngine and Subversion. GvR has a nice write up at http://code.google.com/appengine/articles/rietveld.html. It looks like it's functionally pretty similar to Mondrian. However, I find the following reassuring:

>>> I'm far from done with this application. ... The first users are also already asking for features I had never dreamed of, thanks to the many different styles of development found in the open source world.

I think the most important thing for a code review system for Mozilla is solid, thorough integration with Bugzilla. For better or worse, the Mozilla world, especially the code side, is run through Bugzilla, and creating a disconnect between bugs and reviews would not be a good thing, if only because switching back and forth between two systems is annoying. However, comments in a bug prior to a patch are often relevant for reviews, and the process and progress of the review is often relevant to more than just the reviewer (for instance, consulting other peers or super-reviewers when the reviewer is looking for additional input on architectural or security issues, or for drive-by reviews by other interested developers). All of this belongs in the bug, regardless of what sort of UI is used to present the review process.

I’ve noticed that some Google-led open source projects using Google Code and Rietveld (the Chromium review software Sam mentioned above, or a generic version of it, which was apparently derived from Mondrian) have severe cases of this disconnect. Code is reviewed in one system and then committed, often without any “bug report” to track the reason for the change, and code to fix existing bug reports is also reviewed and then committed without ever updating the bug report. In the latter case, I think it becomes easy to forget you even have a bug report since all the code/review interaction takes place in the other system.

(I don’t have an account that allows me to perform reviews with that tool, so it’s hard for me to comment on how it works as a user rather than simply as an observer, but it certainly does seem optimized for line-by-line nits, as you say. Given that making line-based nits is trivial in our current system, that optimization in the Google systems doesn’t seem particularly useful.)

Maybe this is just the way Google or those particular projects work, but bugs-in-one-system-reviews-in-another seems hard to follow, confusing, and error-prone, and I don’t think it’s a good fit for Mozilla (this Google way even “feels” less open, as if all of the review process is happening behind closed doors, even though that’s really not the case). Whatever UI gets adopted and features get exposed, the most important thing is that the system is well-integrated into Bugzilla.

(As an aside, I find the tendency of this comments box to keep altering its position, especially with regard to the scroll position of the entire page, very annoying.)

@Smokey Ardisson:

I think you're absolutely right about Bugzilla integration being critical.

I agree with Smokey here. I'm a peer on Google Breakpad, which is in google code, and the google folks use the code review toool heavily. However, I find it hard to keep track of development now because there often aren't issues filed in the tracker for these changes. It's frustrating to me, especially since I'm so used to the Mozilla model.
We sell a code review system: Code Collaborator. It is *free* for use by open source projects, however, so it might be of interest to you.

In particular:

1. Code Collaborator tries to encourage two-way information flow in several ways. For example, the comment system is real-time - so if two or more developers are entering comments at the same time then they can carry on a live conversation. It's essentially an IM client hanging off to the side of a diff window. The comment system also provides notifications so that if you are *not* entering/reading comments at the same time as other review participants, you can quickly see at a glance which conversations have been updated since you last took a look. There are additional notification/workflow features via email/RSS.

2. One thing we've heard consistently with respect to higher-level review is that artifacts *other* than the code need to be reviewed. So in our latest release (v5.0, currently in beta), we have added the ability to review PDF files, image files, and URLs.

3. With respect to integration with Bugzilla, Code Collaborator can easily be configured to link issues found during a code review with entries in Bugzilla. In addition, Code Collaborator can also be configured to easily link a comment in a review to an entry in Bugzilla. So if you had a Bugzilla entry that was the impetus for a review, you could include the URL to that Bugzilla entry in data that gets stored for that review. If the review uncovers additional issues that need to be tracked in Bugzilla then you could record those entries as well.

More information available here: http://smartbear.com/codecollab.php

You might also want to take a look at Gerrit: http://code.google.com/p/gerrit/

Gerrit is an open-source, Java-, and git-focused version of Rietveld that's being developed for the Android project. It supports line-by-line comments as well as general comments about the entire change. There's also an open issue to support replies via email: http://code.google.com/p/gerrit/issues/detail?id=228&q=email&colspec=ID%20Typ... — I think that would go a fair way toward encouraging the bi-directionality that you describe.

-sq