Sizable pockets of the Internet world went into a tizzy on Dec. 10, when Gmail suffered an outage that lasted 18 minutes. The outage occurred at the same time Google’s Chrome browser went down. The culprit, according to the latest reports, was there was a bad load balancing change that affected several Google products, including the Chrome sync service. This allows users to sync bookmarks and browser settings across several computers and mobile devices.
The real cause of the problem was good ol’ human error. According to Google engineer Tim Steele, the Chrome Sync Server is reliant upon a backend infrastructure that enforces quotas on per datatype sync traffic. This quota had traffic problems because of a bad load balancing configuration change. That change was a core part of the infrastructure that Google services depend upon. So that meant that several services were affected simultaneously.
Steele said that the crash was due to bad logic that was responsible for handling throttled data types on the client when the data types are unrecognized.
Interestingly, if the Chrome sync service had crashed entirely, the Chrome browser crashes would not have happened. Steele said that the crash wouldn’t happen if the sync server itself could not be reached. It is because of a backend service issue that sync servers depend on getting overwhelmed. Sync servers respond to that by telling all of the clients to throttle all data types.
This sort of outage often leads people to proclaim that cloud computing won’t work. However, what it shows in reality is that cloud computing, just like regular computing, is affected by human error. A single point of failure can affect many services.
So, if you were ready to jump off a cliff because of this short Gmail outage, take comfort in the fact that it rarely occurs J. In the meantime, be sure to note the clean up that has been going on with the Google search results page!