The What and Why of the Open Hub Open Data Initiative
Announced on July 18, 2012, with the Open Data Initiative, Black Duck is licensing Open Hub data under the Creative Commons Attribution 3.0 Unported (CC-BY) license. By doing so, we’re making Open Hub data freely available for nearly any use, with only the requirement to attribute that use back to Black Duck.
We’re eager to contribute our storehouse of analytical intelligence on the FOSS ecosystem back to the community. We can’t predict all the ways this data will be used, but we feel confident that similar to the use of FOSS-licensed source code, the results will be innovative and exciting! With this initiative, we want to help communities, foundations, projects, individual developers, corporate contributors, and the broader software industry leverage the FOSS development model. We believe that Open Hub data, freely licensed, can help everyone in the ecosystem better understand how they’re engaging with FOSS, how the open source model works in practice, and leverage that understanding into more and better code.
Creative Commons licenses are the most popular and widely respected content licenses in the free culture world, of which FOSS is a central part. Wikipedia uses a Creative Commons license, as does StackExchange with their Creative Commons data dump. When we went searching for a content copyright license that embodies the values in FOSS, in particular, the right to freely copy and use content, the Creative Commons Attribution license became the obvious best choice.
We want to make Open Hub data easily consumable by a very broad range of prospective users. While the ShareAlike license’s copyleft requirement can keep data open, it might make it less practical to combine the data with other sources. Because of this potential for limiting its value, Black Duck prefers a more liberal license for Open Hub data.
Black Duck gains three major benefits from the broad use of Open Hub data:
- If Open Hub data is valuable in helping people understand their communities better and drive more engagement and participation, then this data can help create better code, more active development communities, and greater adoption, all of which directly increase demand for Black Duck’s products.
- The CC-BY license requires those who use Open Hub data to attribute the data to us. Such attribution serves to build Black Duck’s reputation as a highly reputable source of the best intelligence about FOSS, enhance awareness of Black Duck and Open Hub, and demand for our products.
- We’re hopeful that as the FOSS ecosystem uses this data more, its value will help create an incentive for community members to join Open Hub and update information in the database to help improve it even more.
We maximize these benefits when we eliminate barriers to using Open Hub data, and so it makes a lot of business sense for us to give it away. By doing so, we solidify Black Duck’s position as a premier source of FOSS market intelligence.
Using the Data
You can access the data either by visiting the site’s pages, or by using the Open Hub API. At this time, we don’t have a data dump or data warehouse – all access is live.
One of our primary motivations for this license change is to make it easy for FOSS organizations to use Open Hub data. Many organizations have rules about not relying on tools or data that is not freely licensed. This has been an obstacle for some groups in using Open Hub data, and this new license is intended to solve that problem. We hope this initiative will make Open Hub data much more acceptable and useful to FOSS communities and organizations.
The TOU includes all the usual prohibitions against unsavory or illegal activity, warranty disclaimers, limitations of liability, rules on contributing your content, and so on. It also describes some rules about using the site, such as the requirement to use the API for automated access, to ask us before using the site or API for commercial purposes, specific attribution requirements, and information on contributing content.
That is correct. While the data itself is not restricted, except by the very free terms of CC-BY, we want to make sure that we respect our users’ privacy and personal information, and that the Open Hub data retains its value, and so we’re putting some limits in place as to how the data is accessed.
In addition to data integrity, we’re interested in helping to protect developers’ privacy. Open Hub does not publish developers’ email addresses, but given a code repository URL and a committer ID, both of which are available from Open Hub, it is possible for someone to invade that corresponding person’s privacy. This has always been possible with publicly accessible code repositories entirely without Open Hub, and there are many vital use cases for Open Hub data that argue against fully anonymizing data from the API.
- If you’re a non-profit organization, a FOSS project, or an individual, you can access our site or API to use the data any way you like, for non-commercial purposes. That’s right – any way you like. You really aren’t limited at all by the TOU, except the usual “don’t do anything illegal” prohibitions. That new Android app with your customized project metrics dashboard? No problem.
- If you’re an analyst, blogger, journalist, or other source of news, commentary, or analysis, then you can access our site or API to use Open Hub data in your editorial content, even when your articles appear on sites that are supported by advertising.
- If you’re a business, you can access our site or API to use Open Hub data for any internal purpose. Build your own tool leveraging Open Hub data combined with other data sources? Go for it! You’re also free to use Open Hub data in presentations and marketing materials. We draw the line at using the site and API to create or target advertisements and direct sales solicitations for your products and services, or building data obtained from the site or API directly into your products and services – or a commercial website, without permission.
- If you’re an academic researcher or scientist, then your use of the site or API to get access to the data is governed by our Academic Use Agreement, which lets you do most anything research-related as long as you anonymize data you publish and attribute the data to us.
If you want to access our site or API to use Open Hub data commercially, we need to know who you are and what you have in mind before we say yes. We want to make sure that the community can crowdsource, validate, and build upon a single consistent, neutral dataset. Unless your plans might mess that up, we’ll very likely grant you permission.
This is where we believe we’re breaking some new ground. Open Hub gains its value to the FOSS community by:
- Establishing a cross-FOSS site and database where everyone can work together to crowdsource new metadata about projects.
- Providing consistently collected data and analysis to compare projects, people, and organizations.
- Compiling useful metrics to help the FOSS world know better how to create more community, more code.
- Acting as a neutral 3rd party without an “axe to grind” on statistics and analysis.
There is a common theme here: the data is much more valuable to the FOSS community if it is trustworthy, in one place, and consistent. Source code is different from data – it can be independently tested to validate what it does or does not do, and an individual component’s value is independently evident. Data can’t be validated this way, and its value comes from its aggregation with other similar data and its integrity. Our challenge is in licensing the data as broadly as possible, while doing everything we can to protect its integrity.
CC-BY does not limit use at all. And we felt we needed this very open copyright license to deliver all the freedoms that the FOSS ecosystem expects. At the same time, we felt that we should try to avoid fragmenting the data and the resulting confusion as to its source and quality. We can’t restrict how someone can use the data, without creating a conflict with CC-BY, nor do we want to. But we can control what someone using our site and API may do.
The Open Source Definition (point 6), Freedom 0 of the Free Software Definition, and other similar statements of the core values of the FOSS world require that a FOSS license not restrict use in any field of endeavor, including commercial uses. The CC-BY copyright license meets that test but the non-commercial variants of the Creative Commons licenses do not. We want people to use Open Hub data for commercial purposes as long as we can protect the data’s integrity, so we’re requiring you to ask first when using the site and API. We picked the least restrictive of all the Creative Commons copyright licenses, and are limiting use of the sites and API as little as we can, while still promoting the objective of maintaining a single site. In the end, our approach is more free than CC-BY-NC and will be easier to manage.
The source code in Open Hub Code is licensed under various open source licenses depending on the project, and the specific licenses applied to the particular source code files. Open Hub’s data licensing does not supersede or govern the use of this code in any way. All source code is covered by its individual license. It is up to you to be aware of the copyright and license for the content you access, and to assess for yourself whether the license is appropriate to your needs, as with every other use of FOSS code.
The details are in the TOU but in a nutshell:
- You must put a visual indication that data originates on the Open Hub site, near where you use Open Hub content.
- You must link back to the original source of the data on Open Hub in such a way that the link is visible and can be followed both by humans and by search engines.
There are a few other detailed requirements but those are the main ones.
Not directly. Our focus is on providing a valuable, unique resource to the FOSS ecosystem, and reaping the reputational and community benefits of doing so. We believe that a natural consequence of this new data license will be more traffic, but our real aim is to help add value to the open source world.
Commercial Use, and the Open Hub API
While the Creative Commons Attribution license doesn’t prohibit commercial use of the data, the Open Hub TOU limits use of the site itself to non-commercial or business-internal uses only. The TOU also prohibits the use of automated “scraping” software or spiders to automatically visit pages and collect data. That said, you can use the Open Hub API to access the data programmatically which has its own API Use Agreement.
The new Open Hub API Use Agreement limits use of the API to creating applications for non-commercial, business-internal use, as is the case for the Open Hub TOU. We’re certainly not against commercial use of the Open Hub API or data. But we want to know who is using it commercially, and to assess whether proposed commercial uses respect the privacy of our users and complement the purpose of the Open Hub site.
Open Hub has always been a free resource for the community and it is our current plan to continue to offer the API free of charge. And while we have no plans to change this at this time, we do reserve the right to make changes in the future.
No, we will not. Black Duck believes that making our data freely available to open source foundations, projects, and contributors is a great way to help foster more rapid FOSS development. Accordingly, we won’t charge such direct participants of the FOSS ecosystem for access to Open Hub data through the Open Hub API. The combination of a restriction-free copyright license such as CC-BY, and this promise, means that the FOSS world can count on Open Hub data being both free as in freedom, and free as in no-charge.
You must have the right to contribute it to us under a CC-BY license. If you don’t have that right – for instance, if it is owned by someone else, or your company owns it and hasn’t authorized you to contribute it to Open Hub – you must not contribute the content! Beyond this key requirement, we have the usual prohibitions against spam, illegal content, and so forth that most sites prohibit.
Since you must be an Open Hub account holder to contribute to the site, we roll up all of the changes you make to a project in the project edit history under your Open Hub user name. Your other user-specific content (such as reviews) is associated with your user name as well.