Open Hub in 2016

Hail Hubbites!

There has been a lot of activity behind the scenes at Open Hub Central with a steady stream of improvements rolling into production. We’d like to brag talk about them and also tell you what we have coming up in 2016.

2015 In Quick Review

  • Project PURR (Platform Upgrade Ruby and Rails) — we wrote a whole new Open Hub UI in the latest tech with 99.5% test coverage (I kid you not!)
  • Effective Spammer Throttling — using verification tools to ensure a real, verified human behind new accounts. Spammer account creation has dropped from way over 700% to a very manageable 13%
  • Focus on Infrastructure
    • Improved a few critically slow queries that dragged the site down
    • Performed the first VACUUM FULL in ages on some critical tables
    • Improved average site performance 5X in 2015. Of course, it was pretty bad at times
  • New Inventions: Security Data. Let’s talk about that, please keep reading.

2016 In Plan

Security Data

New Security Info Button

We started by adding a new button to project pages. When we have vulnerability data from the National Vulnerability Database and/or VulnDB, we add a “Review Security Info” button in the Quick Reference section. This will take you to a new security feature we’re trying out. We’ll show you a graph of the number of vulnerabilities reported by version for the last 10 releases grouped by category.

We’ve gotten some very nice feedback from this initial feature and have decided to do more.

Project Vulnerability Report

The first thing we’re going to roll out is a new Project Vulnerability Report (PVR) that will show two ways of considering project vulnerability data across a project lifetime. One way will be a weighted absolute score: the Project Security Score, where a lower value will be better. The other will be a scaled scoring of projects based up on the weighted score against time: the Project Vulnerability Score, where a higher value will be better. When we roll this new feature out, we’ll include a blog post that details the ideas behind this new feature

Project Security Pages

Based upon the interest and feedback in the security info button, we are going to add some new pages to the set of project pages. These will follow the current focus of the Open Hub — the facts about Open Source Software projects. We’ll show the number of open defects over time, broken down into groups by severity, with trends, scores, and other factual data about vulnerability reports.

More Spammer Cleanup

This has already started and some of you have received some email requests during the Pilot run of this program. We are running a long term email campaign and requesting nearly all users to verify their account. If you have positions claimed, we do not intend to bother you with a few emails. However, you will be required to verify your account when you come back to the Open Hub if you’ve not already done so. The expectation with the outreach effort is that the vast majority of account holders don’t really exist. Account holders will have a generous period of time of about half a year, plus a few reminders (not too many!) to verify their accounts before they are flagged as a spam account and eventually deleted.

More Infrastructure

You may remember when we lost a crawler last year, had no new data for about two weeks, and then took a few months to get back caught up? (I do!) We recognized that our crawler infrastructure has been getting more and more fragile. So we started an effort to virtualize our crawlers and are pilot testing that work now. This will give us greater stability, a simpler code base, cleaner architecture, and horizontal scalability in our back end.

After this new Fetch, Import, and SLOC code (FISbot) is in place and serving the Open Hub and the Black Duck Knowledge Base, we will start work on separating the analytics database from the web application database. This will give each part of the Open Hub — the data collection side and the data presentation side — their own dedicated database that can be optimized for fulfilling their primary purpose.

We’re also going to switch from using the C-based Ohcount to the Java-based Ohcount4J for line counting so that all Black Duck products are reporting the same project statistics.

More Other Stuff

We also would like to do some updates to our UI and may roll out updated pages incrementally (rather than wait until we can touch the entire site entirely). We’d like to get some connection to GitHub with data on Stars, Watches, and Forks, and may be StackOverflow to show the top questions, most recent questions, best answers and answerers on the project pages. It would be pretty cool if we could connect Open Hub accounts to StackExchange accounts and let folks click through to see the answerer’s Open Hub account page with their Open Source resume as well as their answers on StackExchange.

So Far in 2016


Github Repositories

So, in addition to the “Review Security Info” button with the security.openhub.net security page, and the Project Vulnerabilty Report, which will be pushed out into production soon, and the significant improvements to our back end that have yielded additional 2X speed improvements on the site, we have also just released a new feature to bulk-add GitHub repositories to your project. The way this works is when you add a new code location, you can select “GitHub Repositories” from the SCM type and then enter the GitHub account name. We’ll then add all the public repositories in that GitHub account to the project.

There are other variations that we’re considering:

  • Bulk create new projects for each GitHub repository
  • Bulk create new projects from other forges

Also, we’re looking at the possibility of defining a new organization type — Distribution — this way we can identify organizations that package and distribute projects but don’t necessarily own or manage the project. Think “Fedora”, “Debian”, etc. This will require some internal changes to allow a project to be included in a distribution even if it is “claimed” by some other organization or is already part of a different distribution. We think this kind of distinction is long overdue and can be very helpful.

And, penultimately, we’ve been working hard on responding to those users who have contacted us via twitter, email, and have posted on the Forums. Thank you so very much for reaching out to us! And thank you for your continued patience as we work to get your issue resolved or question answered.

One more point: It’s time to say goodbye to “code.openhub.net”. In the near future, we will take the site down and replace it with a curtain message. There are lots of reasons including that the Black Duck product underneath this offering has been discontinued and the infrastructure is very expensive to run and maintain and, most importantly, it seems the most popular use of the Code Search site is to see if one’s own project is there and up to date. We’ve not been able to confirm a significant number of users who actually use the site for searching other repositories for code. On the other hand, we’ve not updated that site in a while, so it may also be that those users who may have been doing that have realized that the data is out of date and aren’t coming back. If you have an opinion, I’d love to hear it.

Thanks as always for being a member of the Open Source Software community and a member of the Open Hub. I’m always open to your email and tweets and am very interested in your thoughts and opinions.

About Peter Degen-Portnoy

Mars-One Round 3 Candidate. Engineer on the Open Hub development team at Black Duck Software. Family man, athlete, inventor
  • Bhavik Solanki

    Thank you for sharing this information helps may people and in this post give some points in detail and all points are helpful and important for the people. it’s very helpful post.