We like to tout how many projects we have in Ohloh. It is right on the home page – today, the number is 550,869. Thats a lot of projects! By any measure, the world of FOSS is a big one. GitHub has over 4.7 million repositories. SourceForge hosts almost 325,ooo projects. But really, is the raw number of projects in Ohloh a good measure of the actual magnitude of FOSS development efforts?
As of the end of March, Ohloh has code analysis for 271,372 projects – i.e. there has been a working code repository at some point in the project’s past history on Ohloh. That is a little under half the projects. Yup – just half the projects on Ohloh have ever had code associated with them. That cuts things down considerably! So, how many of the projects with code on Ohloh have recent activity? Lets have a look:
- 96,824 with a commit in the past 2 years.
- 46,883 with a commit in the past year.
- 29,303 with a commit in the past 6 months.
- 21,251 with a commit in the past 3 months.
- 12,870 with a commit in the past month.
- 5,629 with a commit in the past week.
- 1,224 with a commit in the past day (3/30-3/31, a weekend)
This is still a mammoth amount of development activity! And, there is no denying the value of all that code out there, even the code that is not being actively developed now. This source code commons is an amazing gold mine of value, built over decades by developers around the world, and available under FOSS licenses to be forged into new code, new projects, and new innovations by future developers, some of whom haven’t even learned to program yet – or even been born.
But the real work of FOSS is happening on a small subset of this code commons at any given time. For the sake of discussion, lets focus attention on just the active projects, and lets define “active” as having had a commit in the last year. 46,883 projects have had a commit in the past year – just 17.3% of the projects with a code analysis.
This analysis confirms the conventional wisdom that FOSS plants many seeds, but only a small percentage really take root and thrive over time.
How many of these active projects have a team of developers working on them? If we look at the all-time number of committers for these active projects we can see another important factor at work. Lets take the most generous definition of “community” we can imagine – at least two developers working on a project. How many active projects have a community? A little over half of the active projects have never had more than a single committer. 49.3% of active projects have had at least two committers over the lifetime of their code repositories. This is 8.5% of all analyzed projects, and just 4.2% of all the projects on Ohloh.
Thats right – just 4.2% of projects on Ohloh, or a little more than 23,000 projects are active, and have a community of at least two. Most of the “famous” projects we all know about are in this group – very few projects indeed are highly used but have such a mature code base that there are no commits at all to their repositories.
In my presentation at the Linux Collaboration Summit last week where I discussed these results, I proposed a new metric which I called “liveness” that captures this concept. Wouldn’t it be useful to have a score for projects that places them on a continuum in sensible relation to each other, that spreads out the values so that well-known very active projects do not all bunch up at the high end of the scale, and lesser-known but still active projects still have a meaningful spread? Such a score would let developers gauge the relative energy going into projects – a key factor in assessing whether to adopt a project, either as a contributor or user.
If projects with no activity in the past year, or fewer than two committers get a liveness score of zero, we can weight more recent months’ activity higher and less recent activity lower. Using an exponential time-weighted decay, and normalizing the score such that the most “live” project (the Linux Kernel of course!) is at 1000, we get scores for well-known projects that seem to pass a “sniff test” – they line up roughly with expectations.
The audience at my talk had some excellent feedback: first – “liveness” is a value-laden term – it implies that projects are either alive or dead and who wants to even look at dead projects? So they suggested I come up with a better name. Several developers commented also that basing an activity score on commits has some inherent problems: some projects make only a few large commits to their main trunk, with much of the development activity going on in branches. If all commits are counted equally, such projects will have a skewed score that is too low by comparison to projects that have many smaller commits. They suggested that perhaps some blend of LOC delta and commit count might yield a more useful metric.
What do you think? Would you want to see Ohloh report on such a metric? What would you call it? How would you like to see it presented? Any ideas on how to keep projects from artificially inflating their score or for Ohloh to filter out such spurious activity? Please comment!