We found something that was causing a problem over the past few months and fixed it.
The problem was that some of our automated clean-up jobs were not completed. The job was running, which is why we weren’t getting any failure notifications, but then it would quietly die and the death was undetected. The clue came while we were assessing site performance and wondering why so many projects had old code fetches. And those old records that hadn’t been cleaned out were dragging database performance down.
We found a number of old jobs that had not been cleaned up and were able to track backwards to those cron jobs.
The fix is in and performance is better. Here is the before and after story:
Check out the scale on the left. It goes up to 1.2 seconds (!!). The average server response time for this period was nearly 800 ms.
K. Now check this out:
Check out the scale now. The max value is about 1/3 of the previous maximum. Current average server times are around 220 ms.
Of course, when we don’t run analysis, the server responds with average times of less than 80 ms, which is where we want to be. We’ll get there when we separate the Analytics database from the Web Application database later this year.
But first, we’re working on re-enabling new account creation. The initial spike work has been completed and we are reviewing the results. Thanks for your patience and support during this period.
Oh, and another result of fixing the clean up scripts is that outdated projects are now automatically getting caught up as they are supposed to. Sweet.