Major Microsoft Azure outage was caused by a simple typo
Date:
Mon, 05 Jun 2023 09:00:08 +0000
Description:
Microsoft Azure DevOps was out for 10+ hours in parts of Brazil because of a coding accident.
FULL STORY ======================================================================
A Microsoft Azure DevOps outage in the South Brazil Region, which lasted over 10 hours, was caused thanks to a typo in the code that saw 17 production databases deleted.
Having apologized to impacted customers for the outage, Microsoft has now issued a full post-mortem, sharing details about the investigation that took place from when the outage was first noticed at 12:10 UTC on May 24, until
its remedy at 22:31 UTC on the same day.
Microsoft principal software engineering manager Eric Mattingly shared
details of the code base upgrade which formed part of Sprint 222. Inside the pull request was a hidden typo bug in the snapshot deletion job, which ended up deleting the Azure SQL Server rather than the individual Azure SQL Database. Coding error
Mattingly explained: when the job deleted the Azure SQL Server, it also deleted all seventeen production databases for the scale unit, confirming
that no data had been lost during the accidental process. Read more
The best database software
Microsoft unveils bigger and more powerful Azure VMs
Microsoft Azure accounts hit with phishing attacks to hijack virtual
machines
The outage was detected within 20 minutes, at which point the companys
on-call engineers got to work, however according to the event log the root cause was identified at 16:04, almost four hours after the outage had begun.
Microsoft blamed the over ten-hour fix time on the fact that customers themselves are unable to restore Azure SQL Servers, as well as backup redundancy complications and a complex set of issues with [its] web servers.
Having learned from its mistake, Microsoft has no promised to roll out Azure Resource Manager Locks to its key resources, in an effort to prevent future accidental deletion.
Despite a same-day fix, customers in the region were left without access to some services for several hours, emphasizing how easy it is for things to go wrong and the importance of having backup plans to reduce reliance on single service providers, including cloud storage and other off-prem infrastructure. Looking for an alternative? Check out the best CDN providers
======================================================================
Link to news story:
https://www.techradar.com/news/major-microsoft-azure-outage-was-caused-by-a-si mple-typo
--- Mystic BBS v1.12 A47 (Linux/64)
* Origin: tqwNet Technology News (1337:1/100)