Lots of folks are asking what happened, so we thought you’d like an explanation. This morning, there was a routine maintenance event in one of our European data centres. This typically causes no disruption because accounts are simply served out of another data centre.
Unexpected side effects of some new code that tries to keep data geographically close to its owner caused another data centre in Europe to become overloaded, and that caused cascading problems from one data centre to another. It took us about an hour to get it all back under control.
But we’re inclined to apply Occam’s Razor and take Google’s explanation at face value: The gmail network, like any mammoth-scale cloud or SaaS project, is fantastically complex and any changes to it (like the plan to “keep data geographically close to its owner”) can’t be assumed to work 100% until it’s deployed. Someone screwed up.