Upgrading Crawlers

Hail Hubbites!

At This Point

Here is where things stand: the Open Hub is running on Ruby 1.9.3 on Ubuntu 14 (yay!), the database is running on a new RAID array made of disks supported by our hardware and rated for the use we are giving it (yay!), and we are making progress on our next major release — Orgs Phase 2 (yay!).  We did a release on Thursday, August 7 that broke some images — sparklines and animated gifs.  We deployed a release on Monday, August 11, but fixing the sparklines necessitated backing out a fix to a encoding-based search defect, so we’re working on addressing that in a different way.

At the heart of the complexities are a few key factors.  The largest is that we are struggling under heavy technical debt: Rails is now out in version 4.1.0 and we are running Rails 2.3.18, and while our web servers have been recently updated to Ubuntu 14 and Ruby 1.9.3, our crawlers are still on CentOS with REE 1.8.7.  Additionally, our large database is encoded as SQL-ASCII which has been causing huge complications as we move towards the latest Ruby and Rails.  Just search for “incompatible character encoding Rails”.

Up Next

Next up in the upgrade plan is to upgrade our crawlers. Orgs Phase 2, which will take the Organization feature out of Beta, has a new type of analysis to pre-calculate stats about orgs on the Open Hub.  This analysis will run on the crawlers, but only if they are running Ruby 1.9.3.  So, the crawlers need to be updated.

To update the crawlers, our plan is to turn off ALL the crawlers, replace the OS on one half of the 18 crawlers, bring everything back up, running with 1/2 the crawlers while we install libraries and the application on the new OS’s.  Why do it this way?  Because any crawler can run analysis that may update a repository on any two different other crawlers.  The repositories are stored on their own large-capacity drives, which means we can install a new OS, bring up the server, mount the storage drive and it will be immediately accessible to other servers performing analysis.

Then, we do it again, but with the remaining crawlers.

Impact

Our analysis lag time is typically 3 days.  This means that within a 3 day period, all active projects are checked and their analysis updated. Right now, the median analysis age is 15 days and 10% of analyses are older than 26 days.  Ow.

We could wait until most of the analysis has caught up and then do the OS updates.  However, then we cannot deploy Orgs Phase 2, which is nearing it’s Ready To Ship point.  Additionally, there should be some performance improvement by switching to Ruby 1.9.3.

It should take a less than an hour to update the OS on half of the crawlers (it can be done in parallel).  During that time no analysis will be done.  Then we’ll install the libraries and application, and bring up the Job Scheduler daemon (and verify it works!).  We’ll run that for a bit, then do it again with the remaining servers.  Again, there will be no analysis performed when we are upgrading the OS’s.  This means that the analysis will again slip a bit further behind.  But, with the improved infrastructure, we should see the crawlers get caught up much, much faster.

We should start this process in the few days.

Thanks for your continued patience as we effect these updates and improvements. And, as always, thank you for being part of the open source community!

About Peter Degen-Portnoy

Mars-One Round2 Candidate. Engineer on the Ohloh development team at Black Duck Software. Family man, athlete, inventor

,