My latest column over at Data Center Knowledge picks through some of the latest thinking on datacenter resiliency from experts such as Uptime Institute as well as the actual resiliency strategies deployed by some cloud service providers.
The upshot is that some cloud service providers are investing heavily in new approaches to IT and datacenter resiliency based on the use of distributed, virtualized applications, instances, or containers using middleware, orchestration, and distributed databases across multiple datacenters.
“The evolution of compute from single-site data centers with proven in-house engineering to a multi-site hybrid computing model is enabling distributed models of resiliency that can achieve high availability and rapid recovery alongside improved performance,” Uptime Institute stated in a recent webcast. “The recent move by many to the public cloud is further accelerating this cloud-based resiliency approach. The benefits are potentially vast.”
Now, an easy conclusion to jump to would be that improved resiliency at the IT level should mean that the importance of individual datacenter is lessened. As such it should be possible to design and build new facilities with less redundant M&E equipment: fewer generators, UPS etc. However, while that might be possible in some instances – perhaps in edge datacenters – the reality is that most operators continue to build to Uptime Institute Tier III or equivalent (sorry Uptime I know there is no real equivalent).
As the outage this week in Microsoft’s Azure’s service shows even cloud service providers with advanced resiliency in place – Microsoft recently introduced Availability Zones – aren’t immune to downtime. True, Microsoft said that “customers with redundant virtual machines deployed across multiple isolated hardware clusters would not have been affected by the outage,” but it seems some customers were affected.
The take-away is that for now advanced approaches to redundancy – based on software and networks – should probably viewed as an additional “belt” to support the existing “braces” of conventional single-site redundant M&E and IT.
For more on the issue check out this very good webcast overview of Uptime’s current thinking on advanced resiliency.