A change meant for a small number of servers was instead pushed out to a larger set of them.
What you need to know
- Google services outage occurred on Sunday, June 2.
- The investigation shows it was caused by mistakenly applying a configuration to the wrong servers.
- The fix for the error was prolonged by the network congestion caused by the outage.
Over the weekend Google had a rare outage that affected millions of users. The interruption affected YouTube, Gmail, Google Cloud, and to a lesser extent Google Search.
Not only were Google services interrupted, but apps and services that rely on Google Cloud were affected as well. That includes the likes of Snapchat, Discord, Pokemon Go, and more.
At the time, Google attributed the outage to network congestion but ordered a full investigation into the matter. We now have the results of that investigation. According to a post made on the Google Cloud blog:
The root cause of Sunday's disruption was a configuration change that was intended for a small number of servers in a single region. The configuration was incorrectly applied to a larger number of servers across several neighboring regions, and it caused those regions to stop using more than half of their available network capacity.
Essentially, the outage was caused by mistakenly applying changes to the wrong servers. The network then tried to compensate but was unable to handle large amounts of data and it dropped it in favor of smaller data requests. That's why you may have experienced YouTube being down while Google Search was functioning properly or with a slight delay.
Even though Google engineers were able to detect the issue within seconds, repairing the issue took longer than expected. Unfortunately, the same network congestion that was plaguing Google services was also responsible for prolonging the fix from rolling out.
All in all, the outage had a pretty big impact with a 2.5% decline in YouTube views for one hour, a 30% reduction in traffic for Google Cloud, and 1% of Gmail users experiencing issues.
As part of the investigation, Google is now taking steps to prevent further outages in the future as well as looking into ways to ensure that, if one does occur, it will take less time to restore service.
Google set to reveal pricing and launch info for Google Stadia this Thursday, watch here
Post a Comment