Posted in Visualisation
As a consultant I often find myself in a position where I have to get to know a large existing code base quickly; I need to understand how the code is structured, how well it is written, whether there are any major issues, and if so, whether they are localised or whether they are spread throughout the code base. To get a feeling for the general quality of the code I have found Toxicity charts useful. To understand the structure, Dependency Structure Matrices come in handy. Conceptually somewhere between those two lie metrics tree maps, which I want to write about today.
A metrics tree map visualises the structure of the code by rendering the hierarchical package (namespace) structure as nested rectangles, with parent packages encompassing child packages. The actual display is taken up by the leaves in this structure, the classes. Have a look at the following tree map which shows the JRuby code base, without worrying too much about the "metrics" part yet.
At the top right I have highlighted the org.jruby.compiler package. The tree map shows that this package contains a few classes, such as ASTCompiler and ASTInspector, as well as three subpackages, namely impl, ir, and util, with util for example containing a class called HandleFactory, visible on the far right. (Visible in the full-size version.) In the following I explain how the tree maps visualise metrics, and I will explain how to create such maps from Java source code. As usual, adapting this other programming languages is relatively easy.
Showing metrics in the tree map
From the JRuby example above it is clear that tree maps can visualise two metrics by using the size and the colour of the rectangles. In the example the size of the rectangle represents the number of lines in the corresponding file (the actual length of the file, not the sum of the lines of code in all the methods) while the colour shows the sum of the cyclomatic complexity of all the methods in the class; the redder the class, the higher the complexity.
This map allows easy identification of large and complex classes, and usually file size and complexity are correlated, meaning that larger rectangles appear in a darker shade of red. The left third of the JRuby tree map follows this pattern almost perfectly. However, the tree map also highlights an exception to this pattern: the IANA class in the org.jruby.ext.socket package. This class is large but has a very low complexity for a class if its size, something that is worth an investigation.
It should be obvious that other size metrics, such as method count or length, could be mapped to the rectangle size, while metrics such as class fan-out or data abstraction coupling could be mapped to the colour. Going further, while it is intuitive to show a size-based metric as the size, it can be worthwhile to map other metrics to the size. For example, a tree map showing complexity via the size and fan-out as colour makes it obvious which complex classes have high fan-out and which ones have not. This can be useful to identify which (overly) complex classes can be refactored more easily. In a class that has lower fan-out the complexity is more "self-contained", likely allowing for one or more easy Extract Class refactorings.
Going beyond these examples, it is also possible to map non-metric information onto the colour, for example the name of the dominant committer of a file. In such a case each committer would be mapped to an individual colour, and rectangles could be shaded grey if no single committer accounted for more than a certain percentage of commits. Such a view would show whether a development team has bought into collective code ownership, or whether developers have carved off individual packages for themselves. More ideas, and other visualisations here.
Using Checkstyle to get metrics
Checkstyle is meant to report on violations of guidelines, usually if a numeric threshold is exceeded. (I've shown how this data can be visualised in a previous article.) If all thresholds are set to zero Checkstyle considers everything a violation and reports all metrics. The following file shows the configuration used for this article.
With this file saved as "metrics.xml" Checkstyle can be invoked as follows to create an XML file containing all metrics we are interested in.
For languages other than Java other tools can be used to create a similar output file. The format does not need to be exactly the same (more later).
Drawing tree maps with InfoVis
I've been using the InfoVis Toolkit for years and while development on it has stagnated it just works; at least as long as a Java 5 runtime is available, which, unfortunately, is not the case on Mac OS X Snow Leopard. I'm open to suggestions of better/newer tools...
InfoVis is an interactive Java application that has a decent user interface to map various fields of the input file onto different aspects of the tree map, size and colour being two obvious ones. At this point it is probably worth mentioning that I generally also "sort" the tree map by whatever metric I have mapped to the size. The screenshot on the right shows the settings I used for the JRuby tree map above, and in case you wonder about the abbreviations for the metrics, FLENGTH is obviously file length, and WMCcc stands for Weighted Method Count (cyclomatic complexity).
The remaining piece of the puzzle is the transformation of the Checkstyle output into a format that InfoVis can interpret. Luckily, the input file format for InfoVis, named TM3, is very simple. The following excerpt shows data for the first few classes of the JRuby code base.
FLENGTH CDAC CFOC MCOUNT WMCloc WMCcc INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER 487 6 16 15 422 87 org jruby anno AnnotationBinder 21 0 1 0 0 0 org jruby anno Coercion 14 0 0 0 0 0 org jruby anno CoercionType 5 0 0 0 0 0 org jruby anno FrameField 69 6 9 1 54 11 org jruby anno InvokerGenerator 136 1 3 4 93 26 org jruby anno JavaMethodDescriptor 20 0 0 0 0 0 org jruby anno JRubyClass
The first line contains the names of the metrics, the second line their data types. Both are tab separated. The remaining lines contain the metrics for each of the classes followed by the tree structure, with all package nodes flattened out and separated by tabs as well. Sometimes non-XML formats are so refreshingly easy to work with.
A small Ruby script, which I have attached further down can be used to convert Checkstyle's XML output into the TM3 format, using a command like this:
Currently, the script has the path separator hard-coded. If you intend to run this script on a Windows machine you have to change the path separator constant in line 6 manually.
This Zip archive contains the Ruby script used to convert Checkstyle XML output format to the InfoVis TM3 file format. Remember that you might have to change the path separator.