21 November 2008
If you are somebody who writes code you probably know that moment when you look at some code you didn’t write, or some code you wrote a long time ago, and you think “that doesn’t look good.” Ok, more realistically, you probably think “WTF? I wouldn’t want to touch that with a barge-pole!” It is not even so much about whether the code does what it should do—that takes a bit longer to figure out—or whether the code is too slow. Even if it’s perfectly bug free and performs well, there’s something to the way it’s written. This is part of the internal quality of a software system, something that the users and development managers can’t observe directly; yet, it still affects them because code with poor internal quality is hard to maintain and extend.
Now, as a developer, how do you help managers and business people understand the internal quality of code? They generally want a bit more than “it’s horrible” before they prioritise cleaning up the code over implementing new features that directly deliver business value. Or even: how do you figure out for yourself how bad some code actually is in relation to some other code? These were questions that Chris Brown, Darren Hobbs, and myself were asking ourselves a couple of years ago.
The answer came in the form of a simple bar chart, arguably not the most sophisticated visualisation but a very effective one. And our colleague Ross Pettit had the perfect name for it: The Toxicity Chart. Read on to see what it is and how it’s created.
As usual, let’s start with an example. What follows is the toxicity chart for a version of the Hibernate framework.
In a toxicity chart each bar represents a class and the height of the bar shows the toxicity score for that class. The score is based on several metrics (more on that below) and the higher the score the more toxic the class. The individual components of the score are coloured. For example, the contribution of the method length metric to the overall score is shown in orange. This makes it possible to see at a glance not only how toxic a codebase is but also how the problems are distributed. If there is a lot of purple and orange in the chart, this indicates long and complex methods, which means that the code is probably hard to test on a unit level. Lastly, classes that score zero points are left off the chart.
The calculation of the toxicity score is based on metrics and thresholds. For example, the method length metric has a threshold of 30. If a class contains a method that is longer it gets points proportional to the length of the method in relation to the threshold, e.g. a method with 45 lines of code would get 1.5 points because its length is 1.5 times the threshold. The score for all elements is added up. So, if a class comprises two methods, one that is 45 lines and another that is 60 lines long, the method length component of the score for the class will be 3.5 points. This means that elements are not just classified as toxic but the score reflects how toxic the element is.
The following table shows the metrics that make up the toxicity score and the corresponding base threshold on which the multipliers are based.
|Class Fan-Out Complexity||class||30|
|Class Data Abstraction Coupling||class||10|
|Anon Inner Length||inner class||35|
|Nested If Depth||statement||3|
|Nested Try Depth||statement||2|
|Boolean Expression Complexity||statement||3|
|Missing Switch Default||statement||1|
At this point you might wonder where the selection of metrics and thresholds comes from. Well, we made them up. When we designed the toxicity chart we made a call on what constitutes “toxic” as opposed to just bad. Of course, staying with the method length metric, we normally wouldn’t want to see a 15-line method but that might be disputed. However, we hope that nobody thinks that a 30-line method is acceptable. And in case you’re really uncomfortable with the thresholds, you can obviously change them to build your own toxicity score. We do suggest, though, that you try the values presented here first.
Like many visualisations the creation of the toxicity chart falls into two steps: data acquisition and rendering.
Step 1: For Java projects Checkstyle is the easiest way to get the metrics. The score table above easily translates into a Checkstyle configuration file, which, by the way, is included in the Zip archive attached to this page. Assuming this file is named “metrics.xml” Checkstyle can be invoked like this:
java -jar checkstyle.jar -c metrics.xml -r <source_dir> -f xml -o toxicity.xml
Afterwards, the file “toxicity.xml” contains a list of all components for the toxicity score in an XML format.
Similar tools for other languages should be able to create similar output. It is not a huge problem if the output is somewhat different as it’s always possible to change step 2 to handle different input formats.
Step 2: What is required next is to read the data from XML, aggregate it on a per-class level, and then render the bar chart. Microsoft Excel is good at the latter two but it needs a bit of help to read the XML file. This help comes in the form of a small piece of VBA code. The attached Excel workbook contains a sheet to load the data, an “Open XML” button backed by said VBA code, a pivot table to do the aggregation, and the chart based on the table. So, step 2, really is to open the Excel workbook and load the XML file. That’s all.
Now, if you know me you know that I’ve been working with Macs for a very long time. So, naturally, I’d like to use Excel on the Mac but, alas, in its current version it does not support VBA anymore. So, unfortunately, it’s VMWare Fusion for this one.
This Zip archive contains everything needed to create a toxicity chart for Java. The spreadsheet can be adapted to read input file formats created by tools for different languages; just edit the VBA behind the “Data” sheet.