Using MemTrack to instrument Firefox's memory management


MemTrack is a library for tracking memory allocations in C++ programs.  I had two goals with MemTrack.  The first was a simple demonstration of the classic C/C++ preprocessor-based technique for instrumenting memory allocations, which I'm sure a lot of people have done, but few people have documented.  The second was as a testbed for a rather novel technique for tagging each memory allocation with type information, something that is not normally possible with the preprocessor macro-based approach.

The macro-based approach has some serious limitations, but for some codebases it can work quite well.  I tested against my FH-reader library (a library for reading Macromedia FreeHand files) which was an ideal test case.  All I needed to do was add a '#include "MemTrack.h"' directive in a strategic location and then add one call to the test harness to dump the gathered information after the test run was complete.

I was curious how well the technique would work for a large real world codebase, so I tried it out on the Mozilla Firefox codebase, which is on the order of 2 million lines of C++.  This was kind of a brutal exercise, but I was able to get some potentially useful data out of the effort.  The Firefox team has already used other tools for the substantial memory usage optimizations that were done for Firefox 3.  MemTrack's memory tracking would be a poor substitute for what those tools would provide.  However, MemTrack's ability to annotate allocations with C++ type information might be a useful extension.  My understanding is that type information was already available for XPCom objects but not for other C++ types.

In order to use MemTrack at all on the Firefox codebase, I had to resort to a static pre-processing step.  One purpose of this step is to insert a '#include "MemTrack.h"' directive at the beginning of every .cpp file.  The other major purpose of the static pass is to temporarily suspend the new macro in places where it's going to cause problems.  A common case is any class that declares and defines a custom operator new.  The static preprocessing code is simply a python script which relies on a rather scary regular expression which identifies individual lines in the source code as being problematical.  The script simply undefines the new macro above the problematical line and redefines the macro below it.

For exampe:

#ifdef MEMTRACK_NEW        //:memtrack!  #undef new               //:memtrack!#endif                     //:memtrack!  void* operator new(size_t aSize, nsIPresShell* aHost) {#ifdef MEMTRACK_NEW        //:memtrack!  #define new MEMTRACK_NEW //:memtrack!#endif                     //:memtrack!

There aren't that many files that declare or define operator new but for expediency I ended up wrapping any preprocessor directives that include system headers.  This is probably not a big deal either way, since one would rarely do much debugging or development against an instrumented build.  Nevertheless the amount of explicit instrumentations could probably be reduced, and that would be a good thing.

To sum up the current state of this exercise: I have to statically pre-process the Firefox codebase in order to use MemTrack.  This preprocessing step is pretty much completely automatic on the Mac although it's still pretty crude.  To make it actually useful, MemTrack's tracking layer should be stripped out and the MemTrack instrumentation layer should be piggy-backed on top of the existing memory instrumentation code.  The required static preprocessing step is somewhat of a hassle, but it's not really all that onerous.  If one has to do a special build of Firefox to instrument memory usage anyway, it's probably not a big deal to include a static source preprocessing step as well.  The key question is whether the extra type information that can be gathered by MemTrack is sufficiently useful.  I think that it might be, but I'll need to do some more investigation before I can say for sure.

For future reference, the regular expression I'm using is:

    newIgnorePat = re.compile(r"operator new|include NEW_H|NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW|::new[ (]|include [<]|[)] *new[ (]|[*] new[ (]|!new[ (]|include.*nsILocalFileMac|include.*nsIInternetConfigService|new PathChar")

This pattern includes common cases like occurrences of "operator new" (not just "new") and include directives for system headers as well as some special cases like "new PathChar", which is specific to Firefox on the Mac.
4 responses
MemTrack seems very useful.
Does MemTrack work with g++ ? Where is the latest version of the code ?

thanks and regards

@jose:

See http://www.almostinfinite.com/memtrack.html. The code is at the bottom of the document. It's probably misleading to call the code "the latest" since it's pretty close to the original code. You will probably have to adapt it to the codebase you want to use it in. If it's a small or medium-sized project that will be pretty easy. Something as large as Firefox is a completely different story.

I believe it will work with g++, but it might require a little tweaking.

MemTrack indeed works fine with g++. You just need to adjust a few includes and typecasts in order to get rid of warnings.

In my program, the resulting allocated type "[unknown]" wasn't very useful yet though. I'm still working on it. ;-)

@Moritz:

The most like reasons for seeing blocks tagged with "[unknown]" are: a) library code that was compiled without MemTrack (the C or C++ standard libraries, for instance) and blocks allocated the old fashioned way with malloc/calloc. It should be pretty easy to add malloc/calloc support to MemTrack. Pre-compiled libraries is another thing altogether. Depending on your environment you might be able to recompile them with MemTrack enabled, but that's going to vary based on your tool-chain and codebase.