Microsoft Azure

Azure outage was DDoS attack made worse by failed mitigation

Microsoft says the recent outage affecting Azure and other cloud services was down to a DDoS attack and some faulty protection.

This week’s Microsoft Azure and Microsoft 365 outages were caused by a Distributed Denial of Service (DDoS) attack, according to the tech giant.

In an Azure status mitigation statement, Microsoft says that:

A subset of customers may have experienced issues connecting to a subset of Microsoft services globally. Impacted services included Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, as well as the Azure portal itself and a subset of Microsoft 365 and Microsoft Purview services.

A DDoS attack involves sending large amounts of traffic from multiple sources to a service or website, intending to overwhelm it. A huge influx of traffic all at once can tie up all the site’s resources, denying access to legitimate users. In this case, it seems that the increased traffic caused intermittent errors, timeouts, and latency spikes.

But doesn’t Microsoft have protection in place against this kind of attack, you may wonder? Well yes, but as I told the Daily Mail, sometimes a flaw or error in the infrastructure amplifies an attack rather than warding it off.

Rather than fending off the attack, something in Microsoft’s cloud architecture overreacted and made things worse. It’s very similar to how an ignorant person can ask more questions in an hour than a wise man can answer in a lifetime.

Microsoft confirmed this by saying that the attack activated the DDoS protection mechanisms, however initial investigations suggest that an error in the implementation of those defenses amplified the impact of the attack rather than mitigating it.

Once Microsoft figured out the nature of the attack, it implemented networking configuration changes to support the DDoS protection efforts and performed failovers to alternate networking paths to provide relief.

When this didn’t result in a 100% availability, it proceeded with an updated mitigation approach, first rolling this out across regions in Asia Pacific and Europe. When this turned out to be successful, it rolled it out to regions in the Americas.

Microsoft plans to publish a final post Incident Review with any additional details and learnings. Whether we will learn who was behind the attack remains to be seen. It may have been a more targeted attack against one service, which was amplified by the implementation error in the DDoS protection mechanisms.

Either way, users around the world are starting to get fed up with outages, BitLocker blue screens, and other disruptions to their daily life.

If the final post incident review tells us something new and interesting, we will keep you posted. So stay tuned.