Open Hub Open Data FAQ

The What and Why of the Open Hub Open Data Initiative

What is the Open Hub Open Data Initiative?

Announced on July 18, 2012, with the Open Data Initiative, Black Duck is licensing Open Hub data under the Creative Commons Attribution 3.0 Unported (CC-BY) license. By doing so, we’re making Open Hub data freely available for nearly any use, with only the requirement to attribute that use back to Black Duck.

Why is Black Duck making Open Hub data more available?

We’re eager to contribute our storehouse of analytical intelligence on the FOSS ecosystem back to the community. We can’t predict all the ways this data will be used, but we feel confident that similar to the use of FOSS-licensed source code, the results will be innovative and exciting! With this initiative, we want to help communities, foundations, projects, individual developers, corporate contributors, and the broader software industry leverage the FOSS development model. We believe that Open Hub data, freely licensed, can help everyone in the ecosystem better understand how they’re engaging with FOSS, how the open source model works in practice, and leverage that understanding into more and better code.

Why did Black Duck choose the Creative Commons Attribution license?

Creative Commons licenses are the most popular and widely respected content licenses in the free culture world, of which FOSS is a central part. Wikipedia uses a Creative Commons license, as does StackExchange with their Creative Commons data dump. When we went searching for a content copyright license that embodies the values in FOSS, in particular, the right to freely copy and use content, the Creative Commons Attribution license became the obvious best choice.

Why didn’t you choose a ShareAlike variant of the Creative Commons license?

We want to make Open Hub data easily consumable by a very broad range of prospective users. While the ShareAlike license’s copyleft requirement can keep data open, it might make it less practical to combine the data with other sources. Because of this potential for limiting its value, Black Duck prefers a more liberal license for Open Hub data.

Why is Black Duck giving all this data away? Don’t you have an incentive to charge for it?

Black Duck gains three major benefits from the broad use of Open Hub data:

  • If Open Hub data is valuable in helping people understand their communities better and drive more engagement and participation, then this data can help create better code, more active development communities, and greater adoption, all of which directly increase demand for Black Duck’s products.
  • The CC-BY license requires those who use Open Hub data to attribute the data to us. Such attribution serves to build Black Duck’s reputation as a highly reputable source of the best intelligence about FOSS, enhance awareness of Black Duck and Open Hub, and demand for our products.
  • We’re hopeful that as the FOSS ecosystem uses this data more, its value will help create an incentive for community members to join Open Hub and update information in the database to help improve it even more.

We maximize these benefits when we eliminate barriers to using Open Hub data, and so it makes a lot of business sense for us to give it away. By doing so, we solidify Black Duck’s position as a premier source of FOSS market intelligence.

Using the Data

How do I access the data in Open Hub?

You can access the data either by visiting the site’s pages, or by using the Open Hub API. At this time, we don’t have a data dump or data warehouse – all access is live.

How can I use the data in Open Hub?

Our intention is that you can use Open Hub data any way you like, as long as you attribute your use to us. In addition to our open data license, our site Terms of Use and API license help to protect Open Hub’s users, and preserve the value of the data for everyone. You can see the full details on the Creative Commons website where you will find the “deed” describing the rights this license grants in plain terms:

How does the Open Data Initiative benefit individual developers?

Open Hub data has always been available to individual developers for their own personal use, so this announcement doesn’t change things much for individuals. One subtle change – individuals are also required to follow the attribution requirements specified in our Terms of Use, under the new CC-BY license.

How could this change help FOSS foundations like Eclipse, Apache, FSF, and Mozilla?

One of our primary motivations for this license change is to make it easy for FOSS organizations to use Open Hub data. Many organizations have rules about not relying on tools or data that is not freely licensed. This has been an obstacle for some groups in using Open Hub data, and this new license is intended to solve that problem. We hope this initiative will make Open Hub data much more acceptable and useful to FOSS communities and organizations.

What can analysts, journalists and bloggers do with this data now?

We’ve been pretty liberal in the past with our data for use by analysts and the press, occasionally helping with some custom queries and such. Frankly, we love the idea of Open Hub data informing articles, blogs, and analyses. But our old Terms of Use prohibited using Open Hub data for any commercial purpose. And we know that some people have interpreted that as meaning they couldn’t use Open Hub data without permission. So we’re calling out this use case in our new Terms, so there is no mistaking that we’re eager for our data to be used in preparing editorial content.

Can I use Open Hub data for academic research?

In much the same way as we’ve helped out analysts and editors, we’ve often given students and researchers some extra help in using Open Hub data to investigate the FOSS development model and social phenomenon. We’re also calling out this use case in our Terms of Use, along with a special Academic Use Agreement that spells out the particular rights and requirements of researchers.

How do the license and the site’s Terms of Use work together to determine what people can do?

The Creative Commons license is a copyright license – it governs what you can do with the data. The Terms of Use is a site usage agreement, and governs what you can do using the site’s web pages and the rules you must follow in accessing the data through the tools we provide. We’ve also got a Privacy Policy, an API Use Agreement, and the aforementioned Academic Use Agreement. It sounds a lot more complicated than it really is!

What is in the Terms of Use and API Agreement?

The TOU includes all the usual prohibitions against unsavory or illegal activity, warranty disclaimers, limitations of liability, rules on contributing your content, and so on. It also describes some rules about using the site, such as the requirement to use the API for automated access, to ask us before using the site or API for commercial purposes, specific attribution requirements, and information on contributing content.

So the data is licensed under Creative Commons Attribution, but there are Terms of Use that limit how I use the site?

That is correct. While the data itself is not restricted, except by the very free terms of CC-BY, we want to make sure that we respect our users’ privacy and personal information, and that the Open Hub data retains its value, and so we’re putting some limits in place as to how the data is accessed.

Are there other benefits to the Terms of Use and API Agreement for FOSS developers?

In addition to data integrity, we’re interested in helping to protect developers’ privacy. Open Hub does not publish developers’ email addresses, but given a code repository URL and a committer ID, both of which are available from Open Hub, it is possible for someone to invade that corresponding person’s privacy. This has always been possible with publicly accessible code repositories entirely without Open Hub, and there are many vital use cases for Open Hub data that argue against fully anonymizing data from the API.

Open Hub doesn’t cause any privacy problems that aren’t already inherent in an open development model. But by making it easy to access and sort through huge numbers of committer IDs, Open Hub could potentially be used by the wrong people for the wrong things. Our Terms of Use and API Agreement give us the ability to control this, and while we can’t promise that someone’s email won’t be misused as a result of their contributing to FOSS, we can stop someone from using our servers to do it. That seems like a good idea to us!

How do the Terms of Use and API Agreement limit what I can do?

There are very few things you can’t do under the Terms of Use. Here is what you can do:

  • If you’re a non-profit organization, a FOSS project, or an individual, you can access our site or API to use the data any way you like, for non-commercial purposes. That’s right – any way you like. You really aren’t limited at all by the TOU, except the usual “don’t do anything illegal” prohibitions. That new Android app with your customized project metrics dashboard? No problem.
  • If you’re an analyst, blogger, journalist, or other source of news, commentary, or analysis, then you can access our site or API to use Open Hub data in your editorial content, even when your articles appear on sites that are supported by advertising.
  • If you’re a business, you can access our site or API to use Open Hub data for any internal purpose. Build your own tool leveraging Open Hub data combined with other data sources? Go for it! You’re also free to use Open Hub data in presentations and marketing materials. We draw the line at using the site and API to create or target advertisements and direct sales solicitations for your products and services, or building data obtained from the site or API directly into your products and services – or a commercial website, without permission.
  • If you’re an academic researcher or scientist, then your use of the site or API to get access to the data is governed by our Academic Use Agreement, which lets you do most anything research-related as long as you anonymize data you publish and attribute the data to us.

If you want to access our site or API to use Open Hub data commercially, we need to know who you are and what you have in mind before we say yes. We want to make sure that the community can crowdsource, validate, and build upon a single consistent, neutral dataset. Unless your plans might mess that up, we’ll very likely grant you permission.

Why are there any restrictions on use of the site and API?

This is where we believe we’re breaking some new ground. Open Hub gains its value to the FOSS community by:

  • Establishing a cross-FOSS site and database where everyone can work together to crowdsource new metadata about projects.
  • Providing consistently collected data and analysis to compare projects, people, and organizations.
  • Compiling useful metrics to help the FOSS world know better how to create more community, more code.
  • Acting as a neutral 3rd party without an “axe to grind” on statistics and analysis.

There is a common theme here: the data is much more valuable to the FOSS community if it is trustworthy, in one place, and consistent. Source code is different from data – it can be independently tested to validate what it does or does not do, and an individual component’s value is independently evident. Data can’t be validated this way, and its value comes from its aggregation with other similar data and its integrity. Our challenge is in licensing the data as broadly as possible, while doing everything we can to protect its integrity.

Aren’t such restrictions in conflict with the CC-BY copyright license?

No, the CC-BY is a copyright license granting rights to the content on Open Hub. The Terms of Use and API Agreements are contracts governing people’s use of the site and API.

CC-BY does not limit use at all. And we felt we needed this very open copyright license to deliver all the freedoms that the FOSS ecosystem expects. At the same time, we felt that we should try to avoid fragmenting the data and the resulting confusion as to its source and quality. We can’t restrict how someone can use the data, without creating a conflict with CC-BY, nor do we want to. But we can control what someone using our site and API may do.

Our Terms of Use and API Agreement require that when you’re using our computers to access this data, you do it in a way that maintains its integrity. It isn’t a perfect solution, but we’d rather put this storehouse of FOSS intelligence in the hands of the community even with the potential risks of fragmentation. We’re hoping that the community will recognize the value of maintaining a singular source and focus for developers to contribute to and aggregate this metadata. We believe this is a workable compromise, balancing a very open copyright license with additional terms that protect the data, and the quality and brand promise of the Open Hub trademark.

Why didn’t you just use the Non-Commercial variant of Creative Commons?

The Open Source Definition (point 6), Freedom 0 of the Free Software Definition, and other similar statements of the core values of the FOSS world require that a FOSS license not restrict use in any field of endeavor, including commercial uses. The CC-BY copyright license meets that test but the non-commercial variants of the Creative Commons licenses do not. We want people to use Open Hub data for commercial purposes as long as we can protect the data’s integrity, so we’re requiring you to ask first when using the site and API. We picked the least restrictive of all the Creative Commons copyright licenses, and are limiting use of the sites and API as little as we can, while still promoting the objective of maintaining a single site. In the end, our approach is more free than CC-BY-NC and will be easier to manage.

How does Open Hub’s Open Data Initiative apply to the source code in Open Hub Code?

The source code in Open Hub Code is licensed under various open source licenses depending on the project, and the specific licenses applied to the particular source code files. Open Hub’s data licensing does not supersede or govern the use of this code in any way. All source code is covered by its individual license. It is up to you to be aware of the copyright and license for the content you access, and to assess for yourself whether the license is appropriate to your needs, as with every other use of FOSS code.

Attribution Requirements

How does attribution work under the Creative Commons Attribution License?

All Creative Commons licenses require that the user of the licensed material attribute that material to the original author and/or to “Attribution Parties” – others to whom that attribution credit is owed. In the case of Open Hub data, the Attribution Party is Black Duck, and we have established some specific, straightforward requirements for this attribution in our Terms of Use.

What are the attribution requirements for using Open Hub data?

The details are in the TOU but in a nutshell:

  • You must put a visual indication that data originates on the Open Hub site, near where you use Open Hub content.
  • You must link back to the original source of the data on Open Hub in such a way that the link is visible and can be followed both by humans and by search engines.

There are a few other detailed requirements but those are the main ones.

Is this Open Data Initiative really about driving more traffic to Open Hub?

Not directly. Our focus is on providing a valuable, unique resource to the FOSS ecosystem, and reaping the reputational and community benefits of doing so. We believe that a natural consequence of this new data license will be more traffic, but our real aim is to help add value to the open source world.

Commercial Use, and the Open Hub API

Can I use Open Hub data to help power my commercial site?

While the Creative Commons Attribution license doesn’t prohibit commercial use of the data, the Open Hub TOU limits use of the site itself to non-commercial or business-internal uses only. The TOU also prohibits the use of automated “scraping” software or spiders to automatically visit pages and collect data. That said, you can use the Open Hub API to access the data programmatically which has its own API Use Agreement.

What is the Open Hub API?

The Open Hub API is a simple, RESTful interface to the Open Hub service.

Can I use the Open Hub API to access data for commercial use?

The new Open Hub API Use Agreement limits use of the API to creating applications for non-commercial, business-internal use, as is the case for the Open Hub TOU. We’re certainly not against commercial use of the Open Hub API or data. But we want to know who is using it commercially, and to assess whether proposed commercial uses respect the privacy of our users and complement the purpose of the Open Hub site.

How can I get permission to use the Open Hub API in building my commercial service?

Once you have acquired an API key through the normal method for your application, please contact us through our standard support email address with your request. We’ll evaluate your request based on your proposed use, and grant approval on a case-by-case basis.

Is Black Duck planning to charge a fee for commercial use of the API?

Open Hub has always been a free resource for the community and it is our current plan to continue to offer the API free of charge. And while we have no plans to change this at this time, we do reserve the right to make changes in the future.

Will Black Duck charge FOSS Foundations and projects to use the Open Hub API?

No, we will not. Black Duck believes that making our data freely available to open source foundations, projects, and contributors is a great way to help foster more rapid FOSS development. Accordingly, we won’t charge such direct participants of the FOSS ecosystem for access to Open Hub data through the Open Hub API. The combination of a restriction-free copyright license such as CC-BY, and this promise, means that the FOSS world can count on Open Hub data being both free as in freedom, and free as in no-charge.

Contributing Content

I’m a FOSS contributor and a regular user of Open Hub. How do the new data license and Terms of Use affect me?

The most important change is to the terms under which you contribute content such as project descriptions, tags, and links to Open Hub. Now, under our new Terms of Use, you make your inbound contributions under the same license we are using for outbound licensing. We need you to contribute under CC-BY to grant us all the rights we need to operate Open Hub and offer this resource to the community. You retain the copyright to your contributions, but license them to us under the same liberal license under which we offer Open Hub data to the world.

What are the rules for contributing content like project descriptions and metadata to Open Hub?

You must have the right to contribute it to us under a CC-BY license. If you don’t have that right – for instance, if it is owned by someone else, or your company owns it and hasn’t authorized you to contribute it to Open Hub – you must not contribute the content! Beyond this key requirement, we have the usual prohibitions against spam, illegal content, and so forth that most sites prohibit.

How does Open Hub attribute my contributions to me?

Since you must be an Open Hub account holder to contribute to the site, we roll up all of the changes you make to a project in the project edit history under your Open Hub user name. Your other user-specific content (such as reviews) is associated with your user name as well.