Rethinking resilience: Preparing data centers for the next wave of AI

Tim Hysell, co-founder and CEO of ZincFive, explains why the need for resilient electrical infrastructure has never been more pressing for data center operators.

Recent high-profile outages have highlighted the critical importance of data center resilience. Twitter (now Even more alarming was the outage experienced by Australian telecommunications provider Optus, which caused transport delays, banking problems and hospital phone line outages for 12 hours, affecting more than 10 million users (nearly 40% of the population) and 400,000 companies. The consequences of data center outages are further emphasized in Uptime Intelligence’s 2024 Data Center Outages report.

The report reveals that 55% of operators have experienced an outage in the past three years, with more than half of respondents reporting that the most recent significant outage cost more than $100,000, and 16% saying such outages they cost more than $1 million. As artificial intelligence drives data center energy consumption to unprecedented levels, the need for resilient electrical infrastructure has never been more urgent. The International Energy Agency (IEA) predicts that electricity consumption in data centers will double by 2026, while training new AI models consumes 50 times more electricity than previous generations. Since different sectors integrate further to their operations, the need for the structures that promote these services to maintain resilience grow both in importance and in difficulty.

Faced with these challenges, the data centers must take decisive measures to improve their resistance and adapt to the demanding requirements of the AI. By addressing the most common causes of outages, specifically power issues and human error, including inadequate regular, comprehensive uninterruptible power supply (UPS) testing, data center operators can ensure the reliability and stability of their facilities in an increasingly complex and demanding technological landscape.

Resolving these issues requires examining their causes first. Power issues consistently emerge as the most common cause of major outages, according to Uptime Intelligence’s March 2024 report. An astonishing 42% of respondents cited UPS failure as the leading cause of power-related outages, while 30% of incidents involved problems with the transfer switch to a generator and 20% were attributed to the failure of the generator itself. Human error also plays a significant role in nearly 40% of data outages, highlighting the critical importance of adequate training and following established procedures. Of those who reported an outage caused by human error, 48% cited staff failure to follow procedures, while 45% noted incorrect procedures. These findings underscore the urgent need for data center operators to prioritize both the modernization and maintenance of their electrical infrastructure and the training and continuing education of their staff. Implementing comprehensive staff training and process reviews presents a significant opportunity to reduce disruptions related to human error. To limit the risk of power outages, data center operators must regularly perform rigorous real-world maintenance and testing of backup power systems. They can also adopt more advanced and reliable UPS battery technologies, such as nickel-zinc. Unlike lead-acid and lithium-ion backup batteries, nickel-zinc batteries continue to discharge and carry charge even when a battery cell weakens or dies. This allows the battery string to continue to function and turns what would otherwise be an emergency into a simple replacement note at the next scheduled maintenance cycle, without additional maintenance costs or operational impact. Nickel-zinc batteries offer several additional benefits to increase data center reliability and efficiency. Unlike lithium-ion batteries, they are not capable of thermal runaway at the cell level and can operate reliably at higher temperatures, which can also result in lower cooling costs. They also feature a higher power density than their counterparts, offering the same amount of power in a significantly smaller space. This allows operators to save valuable space for revenue-generating equipment like servers and racks, while providing ample backup power for AI-intensive applications. Nickel-zinc batteries are also more sustainable than lead-acid and lithium batteries and can serve as a convenient replacement for lead-acid batteries, allowing for a seamless transition to this more advanced technology.

As data center energy consumption continues to increase due to the growing demands of artificial intelligence and other energy-intensive applications, ensuring a resilient energy infrastructure has become more important than ever. By adopting comprehensive staff training, rigorous testing and maintenance practices, and safe, reliable, and sustainable battery technologies like nickel-zinc, data center operators can significantly strengthen the resilience of their facilities and ensure they are equipped to handle the growing demands of the digital landscape. Taking proactive measures to address the root causes of outages and implementing innovative solutions helps facility operators ensure that customers can rely on them, no matter what the future holds.

Data center cooling accounts for about 1.2% of global electricity consumption, or about the same amount needed to power more than 19 million homes. And because liquid cooling uses about half the energy of air cooling, it has become the logical choice for many operators, especially as chip density increases to a point that air cannot support. In the liquid cooling landscape (immersion vs. direct-to-chip and single-phase vs. two-phase) we have seen the two-phase direct-to-chip system receive increasing attention given its high reliability and superior ability to cool high-power chips.

From a sustainability perspective, stakeholders should be concerned not only with energy and water consumption, but also with global warming and human health issues. Regarding the latter two, the most pressing issues are related to the use of two-phase working fluids employed in these systems, similar to sustainability issues surrounding refrigeration and HVAC equipment. Most, if not all, two-phase data center cooling solutions currently use refrigerants that fall into the category of perfluoroalkyl and polyfluoroalkyl substances (PFAS). Sustainability concerns associated with PFAS have only increased with the European Union’s recent series of legislative proposals to regulate or ban PFAS, so-called “forever chemicals.” What are PFAS? And should potential stakeholders in two-phase data center cooling (including investors, data center operators, server manufacturers, integrators, and others) be concerned about the sustainability of PFAS refrigerants in two-phase cooling systems?

Should all PFAS be considered “forever chemicals”? First, we need to understand that the family of chemicals that coat PFAS is diverse. The set of chemicals covered by PFAS is no different from hydrocarbons. Hydrocarbons range from natural gas to gasoline, mineral oil, candle wax and plastics; Although they all consist of essentially the same atoms, the properties and applications of hydrocarbons vary widely. The main chemical characteristic that distinguishes these seemingly diverse subgroups of chemicals is the size of the molecules (larger molecules tend to be more solid, smaller molecules tend to be gaseous, and liquids fall in between). Likewise, PFAS can be in vapor or liquid form (refrigerants) or in solid form (plastic or plastic coatings, such as polytetrafluoroethylene [PTFE]). The defining characteristic of PFAS is the presence of one or more fluorine atoms attached to the carbon skeleton of the substance’s molecular structure. Adding fluorine to a molecule tends to provide performance benefits, including lower chemical reactivity, lower flammability, lower toxicity, lower friction, etc., which is why fluorinated chemicals have been used in many everyday applications.

Fortunately, lower molecular weight liquid or gaseous PFAS (such as those used in HVAC, refrigeration, and data center two-phase cooling systems) have a different life cycle (see Figure 2). In the event of a leak, the PFAS refrigerants vaporize and eventually make their way into the upper atmosphere. In the upper atmosphere, two things can happen based on the global warming potential (GWP) of the vaporized PFAS refrigerant. Summary
Recently, the data center industry has increasingly turned to liquid cooling for its energy efficiency benefits. Particularly strong interest in direct on-chip two-phase cooling stems directly from its exceptional ability to handle the most intense thermal loads of high-power chips, coupled with its superior reliability compared to other liquid cooling methods. However, there is often a pause when people consider the use of PFAS refrigerants in two-phase solutions, due to a lack of understanding regarding PFAS that are truly “forever” and PFAS that are truly “not forever” ”.

Post source : datacenterfrontier.com

Was this article useful to you?


0 Feedbacks

Users comments


Abgineh Pardaz Shargh