We’re a bit behind

We’ve had a number of questions about why project analyses have gotten so far out of date.

As mentioned in the last blog post, we upgraded PostgreSQL from 9.2 to 9.4, which caused some significant degradation of performance.  Not because of any problem with PostgreSQL 9.4!  Not at all.  The upgrade process performs a series of incremental analyze steps so that the new query planner has suitable table statistics with which to work.  Even with this approach, our fairly massive critical tables are too large for these small, incremental analyze steps to successfully categorize the table and generate accurate table statistics.

After about a week of running on the upgraded version and trying the standard tools we have to maintain performance targets, it was pretty clear that we needed to bite the bullet and perform “vacuum full” on those critical tables.  This has been done and we saw a sizable decrease in response time (which is want we wanted).

Shortly after addressing the table statistics issue, we had to push an update to our crawlers. We’ve not done this in quite a while. We had updated the Ruby version running the old Ohloh code set from 1.9.3 to 2.0.0, which is the highest Ruby version we can use on our old Rails 2.3.18 code base.  But we only did that on the web heads. There was a good deal more complexity in the crawlers that made us uncomfortable doing that upgrade on all 18 crawlers.

But we had to. We’ve made some database schema improvements and the crawler code had to be brought back into alignment. We upgraded the Ruby version on all the crawlers and then pushed the latest version of code to them all. Everything looked OK and so we started the job scheduler and saw everything come up and start processing. Whoot!

But a day later, we noticed that a massive amount of CompleteJobs — these perform Fetch, Import, and SLOC in order — were failing because they were being killed by the host.  No other information.  This lead us into over a week of checking logs, monitoring processes, instrumenting the code, etc.  We were right to be concerned about updating Ruby on the crawlers.  🙂

We found the root cause recently and quickly deployed a fix.

So, we’re up and running on our upgraded database with our crawlers now in sync with our updated database schema and the CompleteJobs running as expected. Unfortunately, these challenges have built up a sizable backlog of jobs. We apologize for this delay; it only strengthens our resolve to separate the analytics processing database from the front end web application database and ensure each is optimized to perform its most important tasks independently.

About Peter Degen-Portnoy

Mars-One Round 3 Candidate. Engineer on the Open Hub development team at Black Duck Software. Family man, athlete, inventor
  • Any ETA on when the stats will be reasonably current again? I’d like to put some in a presentation about some project, though the stats are 2 months out of date ATM.

    • Hi Jeroen;

      We’re monitoring things closely, but some Analyses jobs are still taking a long time to complete.

      Would you like to contact us at info@openhub.net with the project in which you are interested and we’ll try to get those updated for you?

    • Here are some more details: In the last 24 hours we completed 4560 jobs, which is an average of 190 per hour. That’s way too slow. We should be able to complete 10x that. We’re seeing long running Project Analysis Jobs. These jobs should complete within a few minutes in the best cases, although they have been taking 3-6 hours over the past 12 – 18 months. Recently, they’ve been taking 2+ days. The trend isn’t good, but we are working on the fix, which is pretty significant — migrating the analysis to it’s own data schema.

  • Hannah von Reth

    Did I screw the settings or was the analyzer faster than the crawler after I changed the repo location?

    • Hi Hannah;

      We deployed a fix to some analysis code that has dramatically reduced most analysis processing time, so it is really quite possible that after you changed some repository information, one change was picked up and processed before you finished all your changes. If you’d like to let us know the project, we’d be happy to take a look and make sure all is well.

  • Martin Storsjö

    Stale projects that haven’t had updates for many months; how should one report them? Earlier these things have been handled via the “Technical Issue Help” forum, but there doesn’t seem to be any answer there (and when listing the forum, posts seem to be listed in almost random order, so many recent posts aren’t visible at all; additionally there seem to have been some issues with spam there recently as well). Anyway, a few weeks ago I posted about the fact that the x264 project (https://www.openhub.net/p/x264/enlistments) hasn’t been analyzed for 8 (now 9) months: https://www.openhub.net/topics/9971

    • Hi Martin;

      Thanks for reaching out to us and I apologize for the delay in getting to you on the forum. A few weeks ago, we were swamped trying to keep the site running. Recently, we made a fix that has helped with the throughput tremendously.

      I saw that the Complete Job that performs Fetch, Import, and SLOC has failed a few months ago. I rescheduled it at a higher priority, it has already completed as has the Analysis job.

      Now that we’re not working so concentratedly on keeping the site up and running, we’re making our way through the forums to respond to folks such as yourself who are overdue a response.

  • Nguyễn Phú Quốc
  • Ron Fluegge

    Any updates on when the project analyses will be updated? gadsos hasn’t been run in 3 months.

    • Thanks for contacting us. Most of the backlog has been cleared up and now we’re hunting down those projects that have remained in a failed state. I’ve started a complete job (fetch, import and SLOC) for Gadsos, but see that there are a lot of codeplex.com repositories that are not responding, so there may be other issues to resolve.

  • Things seem generally faster now. When I add repos, I get stats back quite quickly.

    Thanks!

    • We’re so pleased to hear from you, Marc! Happy Holidays and a Joyous New Year!

  • hua554sain
  • forest ration
  • sing953tonk
  • krish kumar
  • Brian Drummond

    Are you still “a bit behind”? I’m having trouble adding a code location to the “ghdl” project … the “may be several hours” before the update happens seems to either (a) stretching beyond a day or (b) not happening at all…

    • Hi Brian;

      Sorry to hear of your trouble, though on the positive side we have long since recovered.

      Please contact us at info@openhub.net with the repo you are trying to add and we’ll be happy to take a look

  • hector27cal
  • slamet

    “Paket Umroh Ramadhan”:https://www.travelkhazzanah.com/

    “Paket Umroh Desember”:https://www.travelkhazzanah.com/

    “Paket Umroh Maret Murah”:https://www.travelkhazzanah.com/