Open Data and Ohloh

CLS12 Saturday Session Grid

CLS12 Saturday Session Grid

PORTLAND, OR: The Community Leadership Summit is an unconference organized by Jono Bacon, author of The Art of Community, and manager of the Ubuntu community. Jono has been organizing this event for several years to happen the weekend prior to OSCON, and it has become famous as a great opportunity for some really smart, savvy folks in the FOSS world to come together and learn from each other. CLS is loosely organized about what it takes to shepherd some of the most potent developer communities on earth. It is about helping to channel the energy of the FOSS movement for positive change and building awesome software.

At CLS, I saw such a hunger for data! FOSS community people are into metrics, analysis and measurement, because data is vital to understanding the opportunities in building communities, projects, organizations, and use cases for FOSS. And open data was a topic of interest to many at the conference, because what rights we have to use data makes all the difference, just as with code. Open data is a resource that fires the imaginations of creative people like those I engaged with at CLS. With open data, the FOSS world can gain new insights, invent new and exciting uses, and leverage this knowledge to engage developers and empower communities.

So we’re incredibly proud and excited to announce that Ohloh is diving into the open data world. As of now, Ohloh data is licensed under the Creative Commons Attribution 3.0 Unported License!

This is a huge step forward for Ohloh, and we hope, a huge contribution to FOSS. By licensing the Ohloh data freely, we think that we’ll help power even more innovation and insight, as projects and developers understand their code and communities better.

We know you’ll have questions – we’ve got answers here in our FAQ. And please read and understand the Ohloh site’s new Terms Of Use, which together with the new copyright license are intended to give you significantly more freedom to access and use the site.

We’re listening! Please let us know what you think, and share with us any ideas you have for making our Open Data Initiative a success!

About Rich Sands

I'm the Principal and Founder of RSands Consulting, a developer/FOSS strategy, product management, and marketing consultancy. Formerly Ohloh's PM, Black Duck is now a client of mine.
  • Pingback: Sprint 38 Deployment (18 July 2012) | Ohloh Meta()

  • Congratulations on this huge step forward!

    A FAQ perhaps so obvious it isn’t one: where can I obtain the Ohloh open data? I assume via browsing the site and using the API, unless I missed the presence of dumps of some form, which would of course be very convenient.

    The next huge step forward (again, assuming I just don’t know where to look) would be to publish the methodology and code used to accumulate the data. 🙂

    • richsands

      @mlinsva, good point on that bit of obviousness. Sometimes we’re so immersed in something we don’t realize that those who aren’t, might not see something we think is crystal clear. I’ll add that bit to the FAQ.

  • John Sullivan

    Hi Rich — I’m glad to see this announcement, but confused. The API license appears to still prohibit nearly all access. Will this be updated? I’m unclear how access to the data can be restricted to personal, noncommercial use, but then the data can be licensed under CC-BY.. The API key bullet list also still says you are not allowed to transfer the data to anyone else. Are these just historical artifacts that will be removed? Will you clarify how this is meant to work? I’m interested, because the FSF will likely want to look at this data and publish results from doing so, but we would want to share the data we use with other people so they could examine what we did. -john

    • richsands

      John,

      Thanks for the comment. We’re trying something difficult to do: open the data while protecting its integrity, and keeping it a valuable resource for the FOSS world. We want to reassure you that we’re doing this precisely so that organizations like the FSF can gain benefit from this body of data. FSF, as a non-profit FOSS foundation can use our API for the stuff you’re interested in.

      One thing I want to clarify – the API Agreement is not intended to prevent  you from transferring the data to someone else. I looked at it again and am unclear on where you see that language – feel free to drop me a line through email (rsands at blackducksoftware.com) and let me know where you see that in the agreement.

      We’re keen for you guys to be able to use Ohloh data – thats why we’re doing this! We’re also keen to protect the data set and its reputation as a trustworthy resource for everyone. We’re listening – and thanks!

    • I believe it’s just the key that you aren’t allowed to transfer to anyone else.

  • John Oberhauser

    I wanted to go this past year but couldn’t. I hope to next year. Any dates on this for 2013?

  • I can see why you want a ToU for your service separate from the data license. That’s pretty standard, and they are two different things. I haven’t looked at the language, but if the ToU is intended to address how the service is accessed and not place additional restrictions on data once downloaded, that makes perfect sense.

    But (and the reason why I’m commenting again) I don’t see what this has to do with integrity and reputation of the data. These are addressed technically though things like publishing hashes of datasets and much more importantly, socially through scrutiny and reproduction of the methodology and data by third parties (which is what I was getting at in my first comment), ie [data] science.

    If you really meant protecting the reputation of Ohloh (such is often what a data publisher means when they talk about protecting integrity and reputation “of the data”) I encourage you to check out http://epsiplatform.eu/content/cc-tools-and-psi-supporting-attribution-protecting-reputation-and-preserving-integrity which explores the various ways CC licenses (including CC-BY) support such desires.

  • any name

    something that’d be cool: a year-round graph of all commits to all tracked projects. I want to see the effect of GsoC, GCI, holidays etc. on FOSS activity 🙂

  • It would be easier to exercise the rights granted under the license if you would provide some sort of data dumps at intervals (yearly? quarterly? monthly?), with vastly simplified terms of service, i.e. “no DoSing”.

    In fact, if you provided them via BitTorrrent, you wouldn’t need any terms of service…