FavoriteLoadingInclude to favorites

“Has any one started out getting discussions with their CIO/CEO about going back again to an in-dwelling mail server? I advocate for it”

Given the scale of its user foundation and with a agreement worthy of up to $ten billion in the bag to run the back again-stop of a superpower’s military services, Microsoft may well want to get started pondering about how it can create a staging process for its Azure cloud that allows it to deploy improvements and reliably roll back again individuals improvements when points crack.

(We know, it is quick to say so from a harmless distance…)

Redmond was at it once again late Monday, knocking an (evidently sizeable) “subset of prospects in the Azure General public and Azure Federal government clouds” offline for 3 hours with swathes of customers globally encountering mistakes doing authentication functions a number of providers were influenced, like Microsoft 365.

The firm blamed the difficulty on a “recent configuration change [that] impacted a backend storage layer, which triggered latency to authentication requests.” (Read through, customers could not login to Groups, Azure and a lot more for hours due to the fact of the snafu).

A full root trigger assessment is pending. (We will update this piece when we see it). The blockage was felt for customers from 22:twenty five BST on Sep 28 2020 to 01:23 BST.

The difficulty arrives a fortnight just after a protracted outage in Microsoft’s Uk South region activated by a cooling technique failure in a knowledge centre. With temperatures growing, automatic devices shut down all community, compute, and storage resources “to safeguard knowledge durability” as engineers rushed to get manual control.

Previously this thirty day period in the meantime Gartner claimed it “continues to have worries linked to the general architecture and implementation of Azure, in spite of resilience-focused engineering efforts and enhanced assistance availability metrics all through the previous year”.

Microsoft Azure CTO Mark Russinovich in July 2019 claimed that Azure experienced fashioned a new Quality Engineering group in his CTO place of work, functioning together with Microsoft’s Website Dependability Engineering (SRE) group to “pioneer new techniques to provide an even a lot more responsible platform” following consumer problem at a string of outages.

He wrote at the time: “Outages and other assistance incidents are a challenge for all public cloud suppliers, and we keep on to increase our knowledge of the sophisticated means in which elements this sort of as operational procedures, architectural designs, components troubles, application flaws, and human elements can align to trigger assistance incidents.

“Has any one started out getting discussions with their CIO/CEO about going back again to an in-dwelling mail server? I advocate for it” a single pissed off user noted on a worldwide Outages mailing listing meanwhile… If cloud is your compressed audio stream that you’re not certain you possess, it may well not be long right before in-dwelling mail servers come to be the classic quality vinyl of the IT planet previous, but extremely a great deal back again in desire.

Stranger points have occurred.