That was not fun

The Open Hub is up and running again after a full day of being unavailable. We apologize for any inconvenience this unexpected downtime caused and want to share what we know about what happened.

In brief; while performing a minor version upgrade of our PostgreSQL database from version 9.4 to 9.6, the upgrade process had a catastrophic failure and we lost the entire database.

Fortunately, we had made a backup before starting the process, and were able to restore from it. However, we did loose a few days of data and changes.  For that we are truly sorry.

We’ve done these upgrades before. As a general rule, we don’t like to get more than 2 minor revisions behind in anything in our stack. So, we planned for the upgrade, tested it rigorously in our staging environment, carefully documented each step and command that would need to be executed. Normally we would only do this kind of work on a Sunday morning, when the Open Hub has the least amount of traffic.

The decision to proceed with the upgrade rests entirely with me as team lead.

We expected a 20 minute upgrade process, followed by an Analyze to generate the necessary statistics which could have taken up to an hour.  We figured the site would be back up in less than 2 hours.

But very early in the process, one of the first pg_upgrade statements generated an error because the target data directory was erroneously entered as the mount point, owned by root, instead of a subdirectory owned by postgres.  This should have simply generated the error, we would have fixed the command and continued on our way.

However, when we checked file systems, it was immediately apparent that the data directory in the original 9.4 location was completely gone, along with all our data. We’ve scoured the history files and the logs to see if there was anything else that could have been a factor, but do not see anything else.  We have even read the source code of the pg_upgrade feature (available at https://doxygen.postgresql.org/pg__upgrade_8c.html#a3c04138a5bfe5d72780bb7e82a18e627).

We are now looking over the entire site and getting updates we know we’ve made after the database backup was made.  Please don’t hesitate to ping us on Twitter at @bdopenhub, or contact us at info@openhub.net with any observations, insults, questions, or comments, etc.

About Peter Degen-Portnoy

Mars-One Round 3 Candidate. Engineer on the Open Hub development team at Black Duck Software. Family man, athlete, inventor
  • XIAO LEE

    Whether there is a fast way for researchers to obtain some part of data from Open Hub instead of crawling it by the API of Open Hub