My latest Critical Thinking column over at Data Center Knowledge looks at flash storage in the data center.
To understand more about how flash storage has gone from a relative outlier to an accepted and core part of the data center infrastructure stack, we spoke with Alex McMullan, CTO EMEA, of flash specialist Pure Storage.
As well as explaining how flash can help improve overall data center efficiency, he also discussed how it supports and enables other disruptive technologies, such as machine learning (ML).
McMullan estimates that up to 20 percent of Pure’s customer base are investing significantly into machine learning and deep learning right now, including what he says are some of the biggest AI projects in the world.
My latest Critical Thinking column over at Data Center Knowledge is part of the site’s focus on all things AI in the datacenter industry this month.
The hope is that AI-driven management software (likely cloud-based) will monitor and control IT and facilities infrastructure, as well as applications, seamlessly and holistically – potentially across multiple sites. Cooling, power, compute, workloads, storage, and networking will flex dynamically to achieve maximum efficiency, productivity, and availabilit
While it’s easy to get caught up in the exciting and disruptive potential of AI, it’s also important to reflect on the reality of how most data centers continue to be designed, built, and operated. The fact is that a lot of the processes – especially on the facilities side – are still firmly rooted in the mundane and manual.
And as Google nearly found to its cost, the answers and actions delivered by AI systems may not always be what was originally anticipated.
Just as Skynet in the film The Terminator took a dispassionate, logical view of preventing conflict, finding that mankind was the problem, Google’s algorithm reached a very simple and accurate conclusion about improving the efficiency of its sites:
The model’s first recommendation for achieving maximum energy conservation was to shut down the entire facility, which, strictly speaking, wasn’t inaccurate, but wasn’t particularly helpful either.
Writing about datacenters and tech I am always looking for parallels with other industries to try and contextualise some of the issues that emerge.
Managing datacenters is challenging but what about other types of critical infrastructure like airports, railways and power stations?
I think I have found another great example.
I just listened to a recent webcast from the always excellent This American Life. Titled, ‘Human Error in Volatile Situations’ it does pretty much what it says on the tin.
The first story in the episode is the most gripping and probably the most infamous. For anyone who’s had experience of managing complex facilities equipment, it’s a must listen.
“In 1980, deep in a nuclear missile silo in Arkansas, a simple human error nearly caused the destruction of a giant portion of the Midwest.”
A devastating explosion, and a near nuclear incident, was caused by human error – use of the wrong tool – but exacerbated by extremely poor decision making from above and emergency operating procedures that seemed comprehensive but didn’t extend to the unthinkable.
I’m planning to check out the book on which some of the podcast is based next – Command and Control: Nuclear Weapons, the Damascus Accident, and the Illusion of Safety – but I’m also conscious that where nuclear incident safety is concerned, ignorance is also bliss.
I was lucky enough to speak with Steve Helvie, VP of channel at the non-profit Open Compute Project (OCP) Foundation recently.
Helvie said OCP is targeting several key markets in 2018 as it looks to maintain its momentum and grow beyond hyper scalers. These include telcos, service providers (from SaaS to colocation), financial services (including blockchain), high-performance computing, healthcare, and government.
Regarding colocation operators, the group has released guidelines and a check-list to help with adoption of OCP equipment in colocation facilities. There are also plans for some kind of stamp or certification which has been discussed for over a year now.
However, the exact form the OCP-ready stamp will take is still being developed, according to Helvie. “We are likely not going to have another brand, but it will be a level of formal recognition. I want enterprises to be able to go into our marketplace and say, ‘Where can I find someone who is ready to host Open Compute?’”
I was lucky enough to be involved recently in an in-depth European Union research project called RenewIT. The project had a number of outputs but the main one was a web-based tool to enable different datacenter designs, and locations for those designs, to be compared across Europe in terms of energy efficiency and carbon emission reduction.
I just published an overview of the tool, which was a finalist in the recent Datacenter Dynamics awards, over at Verne Global’s site. The tool has some particular relevancy for the colocation and cloud services operator as it facilities are based in Iceland. Verne benefits from Iceland’s cheap and plentiful renewable energy and is encouraging more organisations to locate their workloads at its facilities.
Head over to Verne’s website to access the full blog. The RenewIT tool also has its own dedicated site and there is a separate site with more background on the project and its other outputs.
The sentiment in the headline is a pithy reminder of the importance of understanding the past.
The unfortunately long list of datacenter operators that suffered outages in 2017 would do well to heed those words.
Specifically, how can operators that don’t undertake a thorough root-cause analysis after an outage expect to prevent further downtime in the future?
I’ve been working with UK datacenter design company Future-tech that provides specialist forensic engineering services to help root out the causes of downtime and help harden facilities against future outages.
Head over to Future-tech’s site to see their take on the importance of thoroughly investigating the causes of unplanned downtime.
I recently spoke with Iceland-based colocation and cloud services provider Verne Global about their new HPC-as-a-service (HPCaaS) offering hpcDIRECT.
Verne’s managing director Dominic Ward explained how the hpcDIRECT was a natural extension of its colocation services but will also take the company into some new areas in the future.
“I think the balance over time will shift towards more customers wanting to consume more HPCaaS. However for now I think the balance will remain that customers will want the majority – anything over 50% – in a colocation environment while wanting to start to test our HPCaaS. But I do think there will be gradual migration in the same way we have seen that shifting for enterprise cloud environments, or enterprise applications, I do think that is coming for HPC as well”
One of the issues examined by the Uptime panel was how data center operators should respond to extreme weather events caused by global warming.
Uptime CTO Chris Brown argued that hardening facilities against extreme weather and temperatures was not the only issue. Operators also need to put the right procedures in place around data center staffing to better manage extreme weather events. “These last few storms have got people thinking about the operations personnel,” he said. “If you have a major storm coming through, people living and working in that area have their own homes, their own families, their own things to worry about. They are usually going to give those things their attention first before the data center. That is just human nature.”