Google apologizes for cloud outage that was a 'comedy of errors'

Google chairman Eric Schmidt looking downBusiness InsiderGoogle executive chairman Eric Schmidt

Google is investing billions in its data centres and hiring salespeople like crazy to take on Amazon in the cloud computing industry. It just held a big conference in March to get the IT world excited about its cloud.

So it was not good that on Monday, the Google Compute Engine cloud service went down for 18 minutes at 7 p.m. Pacific for just about all of its customers everywhere.

Compute Engine is the cloud service that Google has launched to compete head-to-head with Amazon Web Services EC2, where companies can rent space on Google’s computers accessed over the internet.

While the world did not spin into an apocalyptic frenzy because of the outage (it didn’t impact Google’s regular services like search or maps or Gmail), such a big outage was a black eye. Companies like BrightCove, DataStax, Evite, HTC, Zulily use Compute Engine, Google says. 

More importantly, this is Google. Going dark for nearly 20 minutes just isn’t supposed to happen. The company has systems and backup systems to prevent that.

So on Wednesday, Google published an apology, and a lengthy explanation. It also offered to credit its customers with 10% to 25% of their monthly bill,  more than the refund it promises in its service agreement.

The non-technical TLDR version: Someone was doing a semi-routine update to the network and hit a bug. Then the automated failsafe software that should have caught the problem and automaticaly fix it also hit a bug. Then the software went nuts and sent the wrong technical information across the whole network and boom, the network went down.

All told, that 20 minutes outage caused Google to make “14 distinct engineering changes” to ensure its cloud won’t go down like this again. And more changes are coming “as our engineering teams review the incident with other senior engineers across Google in the coming week.”

The Google Cloud team says, “We recognise the severity of this outage, and we apologise to all of our customers for allowing it to occur.”

But the whole thing still leaves a little egg on Google’s face.

 

NOW WATCH: The science behind why you shouldn’t pop your pimples

NOW WATCH: Tech Insider videos

Want to read a more in-depth view on the trends influencing Australian business and the global economy? BI / Research is designed to help executives and industry leaders understand the major challenges and opportunities for industry, technology, strategy and the economy in the future. Sign up for free at research.businessinsider.com.au.