How to Build an AI Data Center - By Brian Potter
How to Build an AI Data Center - By Brian Potter
BRIAN POTTER
JUN 10, 2024
223 26 Share
This piece is the *rst in a new series from the Institute for Progress (IFP), called Compute in
America: Building the Next Generation of AI Infrastructure at Home. In this series, we
examine the challenges of accelerating the American AI data center buildout. Future pieces will be
published at this link.
We often think of software as having an entirely digital existence, a world of “bits” that’s
entirely separate from the world of “atoms." We can download endless amounts of data
onto our phones without them getting the least bit heavier; we can watch hundreds of
movies without once touching a physical disk; we can collect hundreds of books without
owning a single scrap of paper.
But digital infrastructure ultimately requires physical infrastructure. All that software
requires some sort of computer to run it. The more computing that is needed, the more
physical infrastructure is required. We saw that a few weeks ago when we looked at the
enormous $20 billion facilities required to manufacture modern semiconductors. And we
also see it with state-of-the-art AI software. Creating a cutting-edge Large Language
Model requires a vast amount of computation, both to train the models and to run them
once they’re complete. Training OpenAI’s GPT-4 required an estimated 21 billion
petaFLOP (a petaFLOP is 10^15 Zoating point operations). 1 For comparison, an iPhone 12
is capable of roughly 11 trillion Zoating point operations per second (0.01 petaFLOP per
second), which means that if you were able to somehow train GPT-4 on an iPhone 12, it
would take you more than 60,000 years to ]nish. On a 100 Mhz Pentium processor from
1997, capable of a mere 9.2 million Zoating-point operations per second, training would
theoretically take more than 66 billion years. And GPT-4 wasn’t an outlier, but part of a
long trend of AI models getting ever larger and requiring more computation to create.
But, of course, GPT-4 wasn’t trained on an iPhone. It was trained in a data center, tens of
thousands of computers and their required supporting infrastructure in a specially-
designed building. As companies race to create their own AI models, they are building
enormous compute capacity to train and run them. Amazon plans on spending $150 billion
on data centers over the next 15 years in anticipation of increased demand from AI. Meta
plans on spending $37 billion on infrastructure and data centers, largely AI-related, in 2024
alone. Coreweave, a startup that provides cloud and computing services for AI companies,
has raised billions of dollars in funding to build out its infrastructure and is building 28
data centers in 2024. The so-called “hyperscalers,” technology companies like Meta,
Amazon, and Google with massive computing needs, have enough estimated data centers
planned or under development to double their existing capacity. In cities around the
country, data center construction is skyrocketing.
But even as demand for capacity skyrockets, building more data centers is likely to become
increasingly diecult. In particular, operating a data center requires large amounts of
electricity, and available power is fast becoming the binding constraint on data center
construction. Nine of the top ten utilities in the U.S. have named data centers as their main
source of customer growth, and a survey of data center professionals ranked availability and
price of power as the top two factors driving data center site selection. With record levels of
data centers in the pipeline to be built, the problem is only likely to get worse.
The downstream eiects of losing the race to lead AI are worth considering. If the rapid
progress seen over the last few years continues, advanced AI systems could massively
accelerate scienti]c and technological progress and economic growth. Powerful AI systems
could also be highly important to national security, enabling new kinds of oiensive and
defensive technologies. Losing the bleeding edge on AI progress would seriously weaken
our national security capabilities, and our ability to shape the future more broadly. And
another transformative technology largely invented and developed in America would be
lost to foreign competitors.
AI relies on the availability of ]rm power. American leadership in innovating new sources
of clean, ]rm power can and should be leveraged to ensure the AI data center buildout of
the future happens here.
A data center is a fundamentally simple structure: a space that contains computers or other
IT equipment. It can range from a small closet with a server in it, to a few rooms in an
oece building, to a large, stand-alone structure built speci]cally to house computers.
What we think of as modern data centers, specially-built massive buildings that house tens
of thousands of computers, are largely an artifact of the post-internet era. Google’s ]rst
“data center” was 30 servers in a 28 square-foot cage, in a space shared by AltaVista, eBay,
and Inktomi. Today, Google operates millions of servers in 37 purpose-built data centers
around the world, some of them nearly one million square feet in size. These, along with
thousands of other data centers around the world, are what power internet services like
web apps, streaming video, cloud storage, and AI tools.
A large, modern data center contains tens of thousands of individual computers, specially
designed to be stacked vertically in large racks. Racks hold several dozen computers at a
time, along with other equipment needed to operate them, like network switches, power
supplies, and backup batteries. Inside the data center are corridors containing dozens or
hundreds of racks.
Server rack, via “The Datacenter as a Computer.” Racks are measured in
“units," where a unit is 1.75 inches high. Common rack capacities are 42U or
48U, though many others are available.
The amount of computer equipment they house means that data centers consume large
amounts of power. A single computer isn’t particularly power hungry: A rack-mounted
server might use a few hundred watts, or about 1/5th the power of a hair dryer. But tens of
thousands of them together create substantial demand. Today, large data centers can
require 100 megawatts (100 million watts) of power or more. That’s roughly the power
required by 75,000 homes, or needed to melt 150 tons of steel in an electric arc furnace. 2
Power demand is so central, in fact, that data centers are typically measured by how much
power they consume rather than by square feet (this CBRE report estimates that there are
3,077.8 megawatts of data center capacity under construction in the US, though exact
numbers are unknown). Their power demand means that data centers require large
transformers, high-capacity electrical equipment like switchgears, and in some cases even a
new substation to connect them to transmission lines.
All that power eventually gets turned into heat inside the data center, which means it
requires similarly robust equipment to move that heat out as swiftly as power comes on.
Racks sit on raised Zoors, and are kept cool by large volumes of air pulled up from below
and through the equipment. Racks are typically arranged to have alternating “hot aisles”
(where hot air is exhausted) and “cold aisles” (where cool air is pulled in). The hot exhaust is
removed by the data center’s cooling systems, chilled, and then recirculated. These cooling
systems might be complex, with multiple “cooling loops” of heat exchange Zuids, though
nearly all data centers use air to cool the IT equipment itself.
Hot aisle cold aisle data center arrangement, via 42U.
These cooling systems are large, unsurprisingly. The minimum amount of air needed to
remove a kilowatt of power is roughly 120 cubic feet per minute; for 100 megawatts, that
means 12 million cubic feet per minute. Data center chillers have cooling systems with
thousands of times the capacities of a typical home air conditioner. Even relatively small
data centers will have enormous air ducts, high-capacity chilling equipment, and large
cooling towers. This video shows a data center with a one million gallon “cold battery”
water tank: Water is cooled down during the night, when power is cheaper, and used to
reduce the burden on the cooling systems during the day.
Because of the amount of power they consume, substantial eiort has gone into making data
centers more energy eecient. A common data center performance metric is power usage
eiectiveness (PUE), the ratio of the total power consumed by a data center to the amount of
power consumed by its IT equipment. The lower the ratio, the less power is used on things
other than running computers, and the more eecient the data center.
Data center PUE has steadily fallen over time. In 2007, the average PUE for large data
centers was around 2.5: For every watt used to power a computer, 1.5 watts were used on
cooling systems, backup power, or other equipment. Today, the average PUE has fallen to a
little over 1.5. And the hyperscalers do even better: Meta’s average data center PUE is just
1.09, and Google’s is 1.1. These improvements have come from things like more eecient
components (such as uninterruptible power supply systems with lower conversion losses),
better data center architecture (changing to a hot-aisle, cold-aisle arrangement), and
operating the data center at a higher temperature so that less cooling is required.
There have also been eeciency improvements after the power reaches the computers.
Computers must convert AC power from the grid into DC power; on older computers, this
conversion was only 60-70% eecient, but modern components can achieve conversion
eeciencies of up to 95%. Older computers would also use almost the same amount of
power whether they were doing useful work or not. But modern computers are more
capable of ramping their power usage down when they’re idle, reducing electricity
consumption. And the energy eeciency of computation itself has improved over time due
to Moore’s Law: Smaller and smaller transistors mean less electricity is required to run
them, which means less power is required for a given amount of computation. From 1970 to
2020, the energy eeciency of computation has doubled roughly once every 1.5 years.
Because of these steady increases in data center eeciency, while individual data centers
have grown larger and more power-intensive, power consumption in data centers overall
has been surprisingly Zat. In the U.S., data center energy consumption doubled between
2000 and 2007 but was then Zat for the next 10 years, even as worldwide internet traec
increased by more than a factor of 20. Between 2015 and 2022, worldwide data center
energy consumption rose an estimated 20 to 70%, but data center workloads rose by 340%,
and internet traec increased by 600%.
Beyond power consumption, reliability is another critical factor in data center design. A
data center may serve millions of customers, and service interruptions can easily cost tens
of thousands of dollars per minute. Data centers are therefore designed to minimize the
risk of downtime. Data center reliability is graded on a tiered system, ranging from Tier I
to Tier IV, with higher tiers more reliable than lower tiers. 3
Most large data centers in the U.S. fall somewhere between Tier III and Tier IV. They have
backup diesel generators, redundant components to prevent single points of failure,
multiple independent paths for power and cooling, and so on. A Tier IV data center will
theoretically achieve 99.995% uptime, though in practice human error tends to reduce this
level of reliability.
A 2N redundant power system, where every power component (utility feed,
generator, UPS, etc.) has a full backup. Via The Data Center Builder’s Bible.
Today data centers are still a small fraction of overall electricity demand. The IEA
estimates that worldwide data centers consume 1 to 1.3% of electricity as of 2022 (with
another 0.4% of electricity devoted to crypto mining). But this is expected to grow over
time. SemiAnalysis predicts that data center electricity consumption could triple by 2030,
reaching 3 to 4.5% of global electricity consumption. And because data center construction
tends to be highly concentrated, data centers are already some of the largest consumers of
electricity in some markets. In Ireland, for example, data centers use almost 18% of
electricity, which could increase to 30% by 2028. In Virginia, the largest market for data
centers in the world, 24% of the power sold by Virginia Power goes to data centers.
Power availability has already become a key bottleneck to building new data centers. Some
jurisdictions, including ones where data centers have historically been a major business,
are curtailing construction. Singapore is one of the largest data center hubs in the world,
but paused construction of them between 2019 and 2022, and instituted strict eeciency
requirements after the pause was lifted. In Ireland, a moratorium has been placed on new
data centers in the Dublin area until 2028. Northern Virginia is the largest data center
market in the world, but one county recently rejected a data center application for the ]rst
time in the county’s history due to power availability concerns.
In the U.S., the problem is made worse by dieculties in building new electrical
infrastructure. Utilities are building historically low amounts of transmission lines, and
long interconnection queues are delaying new sources of generation. Data centers can be
especially challenging from a utility perspective because their demand is more or less
constant, providing fewer opportunities for load shifting and creating more demand for
]rm power. One data center company owner claimed that the U.S. was nearly “out of
power” for available data centers, primarily due to insuecient transmission capacity. Meta
CEO Mark Zuckerberg has made similar claims, noting that “we would probably build out
bigger clusters than we currently can if we could get the energy to do it." One energy
consultant pithily summed up the problem as “data centers are on a one to two-year build
cycle, but energy availability is three years to none."
Part of the electrical infrastructure problem is a timing mismatch. Utility companies see
major electrical infrastructure as a long-term investment to be built in response to
sustained demand growth. Any new piece of electrical infrastructure will likely be used far
longer than a data center might be around, and utilities can be reluctant to build new
infrastructure purely to accommodate them. In some cases, long-term agreements between
data centers and utilities have been required to get new infrastructure built. An Ohio power
company recently ]led a proposal that would require data centers to buy 90% of the
electricity they request from the utility, regardless of how much they use. Duke Energy,
which supplies power to Northern Virginia, has similarly introduced minimum take
requirements for data centers that require them to buy a minimum amount of power.
Data center builders are responding to limited power availability by exploring alternative
locations and energy sources. Historically, data centers were built near major sources of
demand (such as large metro areas) or major internet infrastructure to reduce latency. 4 But
lack of power and rising NIMBYism in these jurisdictions may shift their construction to
smaller cities, where power is more easily available. Builders are also experimenting with
alternatives to utility power, such as local solar and wind generation connected to
microgrids, natural gas-powered fuel cells, and small modular reactors.
Influence of AI
What impact will AI have on data center construction? Some have projected that AI models
will become so large, and training them so computationally intensive, that within a few
years data centers might be using 20% of all electricity. Skeptics point out that historically
increasing data center demand has been almost entirely oiset by increased data center
eeciency. They point to things like Nvidia's new, more eecient AI supercomputer (the
GB200 NVL72), more computationally eecient AI models, and future potential ultra-
eecient chip technologies like photonics or superconducting chips as evidence that this
trend will continue.
We can divide the likely impact of AI on data centers into two separate questions: the
impact on individual data centers and the regions where they're built and the impact of
data centers overall on aggregate power consumption.
For individual data centers, AI will likely continue driving them to be larger and more
power-intensive. As we noted earlier, training and running AI models requires an
enormous amount of computation, and the specialized computers designed for AI consume
enormous amounts of power. While a rack in a typical data center will consume on the
order of 5 to 10 kilowatts of power, a rack in an Nvidia superPOD data center containing 32
H100s (special graphics processing units, or GPUs, designed for AI workloads that Nvidia
is selling by the millions) can consume more than 40 kilowatts. And while Nvidia’s new
GB200 NVL72 can train and run AI models more eeciently, it consumes much more power
in an absolute sense, using an astonishing 120 kilowatts per rack. Future AI-speci]c chips
may have even higher power consumption. Even if future chips are more computationally
eecient (and they likely will be), they will still consume much larger amounts of power.
Not only is this amount of power far more than what most existing data centers were
designed to deliver, but the amount of exhaust heat begins to bump against the boundaries
of what traditional, air-based cooling systems can eiectively remove. Conventional air
cooling is likely limited to around 20 to 30 kilowatt racks, perhaps 50 kilowatts if rear heat
exchangers are used. One data center design guide notes that AI demands might require
such large amounts of airZow that equipment will need to be spaced out, with such large
airZow corridors that IT equipment occupies just 10% of the Zoor space of the data center.
For its H100 superPOD, Nvidia suggests either using fewer computers per rack, or spacing
out the racks to spread out power demand and cooling requirements.
Because current data centers aren’t necessarily well-suited for AI workloads, AI demand
will likely result in data centers designed speci]cally for AI. SemiAnalysis projects that by
2028, more than half of data centers will be devoted to AI. Meta recently canceled several
data center projects so they could be redesigned to handle AI workloads. AI data centers
will need to be capable of supplying larger amounts of power to individual racks, and of
removing that power when it turns into waste heat. This will likely mean a shift from air
cooling to liquid cooling, which uses water or another heat-conducting Zuid to remove heat
from computers and IT equipment. In the immediate future, this probably means direct-to-
chip cooling, where Zuid is piped directly around a computer chip. This strategy is already
used by Google’s tensor processing units (TPUs) designed for AI work and for
Nvidia’sGB200 NVL72. In the long term, we may see immersion cooling, where the entire
computer is immersed in a heat-conducting Zuid.
Regardless of the cooling technology used, the enormous power consumption of these AI-
speci]c data centers will require constructing large amounts of new electrical
infrastructure, such as transmission lines, substations, and ]rm sources of low-carbon
power, to meet tech companies' climate goals. Unblocking the construction of this
infrastructure will be critical for the U.S. to keep up in the AI race.
Our second question is what AI’s impact will be on the aggregate power consumption of
data centers. Will AI drive data centers to consume an increasingly large fraction of
electricity in the US, imperiling climate goals? Or will increasing eeciency mean a
minimal increase in data center power consumption in aggregate, even as individual AI
data centers grow monstrous?
This is more diecult to predict, but the outcome is likely somewhere in between. Skeptics
are correct to note that historically data center power consumption rose far less than
demand, that chips and AI models will likely get more eecient, and that naive
extrapolation of current power requirements is likely to be inaccurate. But there's also
reason to believe that data center power consumption will nevertheless rise substantially.
In some cases, eeciency improvements are being exaggerated. The eeciency improvement
of Nvidia's NVL72 is likely to be far less in practice than the 25x number used by Nvidia for
marketing purposes. Many projections of power demand, such as those used internally by
hyperscalers, already take future eeciency improvements into account. And while novel,
ultra-lower power chip technologies like superconducting chips or photonics might be
plausible options in the future, these are far-oi technologies that will do nothing to
address power concerns over the next several years.
In some ways, there are far fewer opportunities for data center energy reductions than
there used to be. Historically, data center electricity consumption was Zat largely due to
increasing PUE (less electricity spent on cooling, UPS systems, etc). But many of these
gains have already been achieved: the best data centers already use just 10% of their
electricity for cooling and other non-IT equipment.
Skeptics also fail to appreciate how enormous AI models are likely to become, and how
easily increased chip eeciency might get eaten by demands for more computation. Internet
traec took roughly 10 years to increase by a factor of 20, but cutting-edge AI models are
getting four to seven times as computationally intensive every year. Data center projections
by SemiAnalysis, which take into account factors such as current and projected AI chip
orders, tech company capital expenditure plans, and existing data center power
consumption and PUE, suggest that global data center power consumption will more than
triple by 2030, reaching 4.5% of global electricity demand. Regardless of aggregate trends,
rising power demands for individual data centers will still create infrastructure and siting
challenges that will need to be addressed.
Conclusion
The rise of the internet and its digital infrastructure has required the construction of vast
amounts of physical infrastructure to support it: data centers that hold tens of thousands of
computers and other IT equipment. And as demands on this infrastructure rose, data
centers became ever larger and more power-intensive. Modern data centers demand as
much power as a small city, and campuses of multiple data centers can use as much power
as a large nuclear reactor.
The rise of AI will accelerate this trend, requiring even more data centers that are
increasingly power-intensive. Finding enough power for them will become increasingly
challenging. This is already starting to push data center construction to areas with available
power, and as demand continues to increase from data center construction and broader
electri]cation, the constraint is only likely to get more binding.
2 Per the steel presentation, a typical electric arc furnace makes between 130 and 180 tons per hour,
and requires 650 kilowatt-hours of power per ton. That yields 97,500 kilowatts, or 97.5 megawatts.
3 Other countries sometimes have their own data center grading systems that broadly correspond
to this tiered system. Some providers claim they have even more reliable Tier V data centers, an
unoecial tier that doesn’t seem to be endorsed by the Uptime Institute, a data center trade
organization.
4 Being near major internet infrastructure is part of the reason why Northern Virginia became a
data center hotspot.
Previous
26 Comments
Write a comment...
Why not build data centers where there is lots of power and cooling is less of an issue due to cold
outdoor temperatures - like the James Bay area of Quebec?
LIKE (8) REPLY SHARE
5 replies
I'm kinda surprised at the NIMBY issue. I mean a data center is in some sense the perfect neighbor
as they will occupy high value real estate and have virtually no burden in terms of transit nor
burden most city services.
LIKE (5) REPLY SHARE
5 replies
24 more comments...