Until we were testing our latest build last Thursday. We were playing on seven PlayStation 3 devkits, when suddenly 4 of them froze for about 90 seconds. Argh! We thought this really difficult bug had magically disappeared, but now it was back, with a vengeance!
Again, no relevant logging. Yet something really interesting nevertheless: the PlayStations were in different online matches, so they weren’t even communicating with each other! So how could they all freeze at the exact same time? We concluded that the only thing they had in common, was that they all talk to the same Sony matchmaking servers, so we started investigating all our code related to that. Still, we couldn’t find anything.
So we added a lot more logging, and did some really advanced stuff to get more info on the stack during the freeze (which is difficult to get from an executable that has been stripped of all debugging info), and started playing again. I let the PlayStations perform automatic testing all night, but the bug didn’t occur. Then we played the game for five more hours with the entire team and BAM!, it finally happened again on two consoles!
This time, we had more info and it turned out that the game froze in different spots on both consoles, and both did not contain any calls to Sony’s matchmaking servers. In fact, it was in between two logging calls, in a spot where nothing relevant was happening. So we concluded there were only two possible causes: either other threads were hogging the entire CPU (due to how the scheduling system on the Playstation 3 works, high priority threads can do this permanently), or the logging itself was broken.
So we started experimenting around that, and then we finally found the cause of this ‘bug’: when the PC that is tracking logs goes into sleep mode, the connected consoles freeze a little while later. Once the PC is active again, the consoles continue as well a little later. The PC that was tracking the logs automatically went into sleep mode after not touching it for
30 minutes. This only happened during extensive play-testing, because people normally actually use that PC. So it wasn’t even a bug in our code! ARGH!
This may all seem really obvious in hindsight, but in general when we have a bug/freeze/crash it is in our own code, not in one of the tools we use. With such a big codebase, it is easy to not even think about something else. Also, in the chaos of 14 people playing the game on 7 consoles, it is easy to overlook one specific PC going into Sleep mode right before the consoles freeze… To be honest I still don’t know why not all consoles connected to that PC froze. But I intend to leave it at that…”
Awesomenauts will be out this spring on PSN, XBLA and Steam.