I was forwarded a link to David Wheeler’s blog recently (link). David Wheeler is a prolific writer (software and prose) with a lot of experience in the Free/Libre Open Source Software (FLOSS) space. In his blog he gives us a fairly positive review and I was happy to see that he found our site generally useful (thanks David!).
However, in his blog, he alleges that we are likely using his source code analysis tool (sloccount) to determine projects’ source lines of code (sloc). As David astutely points out, our focus is on providing a service: how we do it is mostly inconsequential to our users. So using his tool would have been a fine idea – and while I would have liked to, in reality, we didn’t: we wrote our own tool from scratch, which we call ‘Lingo’.
Why not use sloccount? The simple answer is that it didn’t meet our needs enough- which meant either extending sloccount significantly or writing something from scratch. Thinking through the list of required extensions combined with my lack of Perl skills (some call it a Perl phobia), I chose the go from scratch. Lingo is written in Ruby with some native (C) extensions for performance reasons.
At this point, our chief challenge with sloc analysis is scalability. Beyond analyzing the last snapshot of each codebase, we actually go back in time to every checkin in the history of each project and analyze the source lines of code for every file in every checkin/patch. This allows us to know exactly who’s responsible for how many lines of code. It also turns out to be quite a workout for our lab. Each fix/feature we add to Lingo normally requires us to re-analyze our entire library of projects from scratch. Our poor servers don’t get much rest.