Microsoft’s cloud services, including Xbox Live, were disrupted Thursday because of a DNS error.
Services connected to Microsoft’s Windows Azure cloud suffered a disruption Thursday — the second one interruption in a under a month. Online reports indicate impacted services included Microsoft.com, Outlook.com, Office 365, and Xbox Live. Microsoft had resolved lots of the problems by Thursday night, avoiding the possibly cataclysmic possibility that Xbox Live will be down when Xbox One units went on sale just after midnight Friday morning.
The disruptions began at 2:22 p.m. PT and stretched across multiple regions. Microsoft corporate vice chairman Scott Guthrie confirmed via Twitter that the issue failed to involve Azure itself. Rather, “The issue is a DNS name server issue outside of azure.” Microsoft said Thursday evening that Azure was running normally.
As of Friday morning, the Azure service dashboard showed most services were functioning as intended, though partial interruptions were plaguing compute functions in Asia, Europe, and the u. s. . Despite the outage, Windows Azure has generally proved as reliable as its competitors, lots of that have also endured widespread disruptions. Amazon, as an example, suffered a big failure over Easter weekend in 2011.
Glitches that knock multiple regions offline are especially rare because Microsoft, Amazon, and other major cloud providers typically organize datacenters into clumps — or “stamps,” in Microsoft parlance — of one,000 servers each.
[ What does it take to maximise the cloud? Read The Cloud Transition Demands 3 Critical Characteristics.]
These stamps include independent power, networking, and storage infrastructure. Theoretically, this tactic stops an issue in a single place from spreading to others, thus keeping such things as Azure available even if problems inevitably arise.
As Guthrie’s tweet implies, if a DNS failure was the culprit, Microsoft’s stamps weren’t a part of the issue. Rather, Azure was operating because it should; customers just couldn’t reach it.
Though Azure outages are rare, Microsoft has typically been transparent when they’ve occurred. The corporate published a technical report following its most notorious disruption, the Leap Day interruption on Feb. 29, 2012. If so, faulty security certificates incorrectly indicated that servers were failing, which triggered the cloud’s governing software to transfer virtual machines inappropriately. The undeniable fact that the brand new VMs carried incorrect certificates themselves exacerbated the problem. Microsoft deployed a fix within 10 hours.
Azure outage affected Xbox Live hours before Xbox One went on sale.
Another significant outage occurred on the end of October. If that’s the case, Azure GM Mike Neil told InformationWeek’s Charles Babcock this week, the disruption stemmed from a bug inside the API for staging systems. Neil said Microsoft will release its full analysis of the October incident this year. When a controversy occurs, Microsoft makes a speciality of restoring operations as quickly as possible to reduce the effect on customers. More in-depth forensic determinations, similar to the basis reason for the difficulty, are saved until later.
Some businesses remain hesitant to embrace the cloud thanks to concerns over security and reliability. Service disruptions similar to the one who happened Thursday do little to cajole these skeptics. Nonetheless, Azure and the goods it supports are among Microsoft’s most promising assets.
Neil told Babcock that Microsoft’s cloud is gaining 1,000 customers per day. The corporate reported in September that its Azure-backed Office 365 products were on pace to post $1.5 billion in annual revenue. Microsoft also said this year that greater than 300,000 Azure servers would support enhanced Xbox One experiences.
Consumerization 1.0 was “we do not need IT.” Today we want IT to bridge the distance between consumer and business tech. Also within the Consumerization 2.0 issue of InformationWeek: Stop worrying concerning the role of the CIO (free registration required).
More Insights