As we talked about in Bad Disk == Bad Performance, we had a bad disk in a RAID array that was causing huge database access times, dragging the new Open Hub’s performance down and sometimes showing a “This website is under heavy load” error message. We also talked about how we had a spare disk and were going to swap the two.
We did and the new disk rebuilt fairly quickly. Website performance improved a bit, but quickly went back huge database access times. Here are some pictures. In these images, the large yellow area is our database access time and the thin blue line is our application execution time. The top image is when we were tracking the application as “Ohloh”. You can see how the problem started early on Sunday, July 20.
This picture is when we changed the tracking and reporting of the application to “Open Hub”. We see a clear drop in access times after the new disk is online during the 23rd, but the access time pushed back up.
We were sad and disappointed. Those numbers were supposed to go down and stay down. What was going on? Our outstanding DevOps engineer found that even though all the disks in the RAID array were in good operating order as was the RAID controller itself, the entire system was getting pretty hot; 60° to 70° Celsius with a critical cap of 100° C. These disks, and we will opt to not mention the manufacturer, are designed to degrade performance when they cross over around 50° C. Not good for a new application that was getting pounded by bots, sales and marketing demos, and all you lovely folks.
We have better disks on order that are designed for our hardware and the type of load we put on our disks. We have another RAID array that we can repurpose until the better disks arrive. OH, and we throttled back the bots and found some queries we could cache to improve performance. Because you, our users, are what is important to us. Thanks as always for being part of the open source community.