Power Capacity Estimation ML Model

Introduction

Data centers are critical infrastructure that house powerful computing resources used for storing, processing, and managing vast amounts of data. The power capacity of a data center refers to the total amount of electrical power it can consume to operate its IT equipment and supporting infrastructure, such as cooling systems and power conditioning units. This capacity is a crucial factor in determining a data center's ability to handle workloads and maintain uptime.

The US Data Center Power Capacity Estimation Model is an approach designed to accurately predict the power capacity of data centers across the United States. This model leverages a dataset that includes detailed information on facility size, IT infrastructure, and others. By employing an ensemble modeling technique, we integrate multiple predictive algorithms to enhance the accuracy and reliability of our estimates. This model serves as a vital tool for data center operators, energy planners, and policymakers, providing them with the insights needed to optimize energy usage and plan for future capacity needs effectively.

About Data Center's Power Capacity

The power consumption of data centers is significantly influenced by various factors, with artificial intelligence (AI) being a major contributor to recent increases. According to the latest sustainability report from Meta, 14,975 GWh has been consumed by their data centers by the end of 2023. For instance, the data center located in Prineville, OR consumed 1,243 GWh and by applying a simple formula we can estimate the power capacity of the facility.

Power Capacity (MW) = Energy Consumption (MWh)Time (hours) x Percentage Utilization

Assuming the data center operates continuously over the year (8,760 hours) and utilizes a percentage utilization rate between 70% and 90%, we can estimate a power capacity between 157 MW and 200 MW.

Furthermore, as projected by Newmark, power usage will reach 35 GW by 2030, nearly double the 17 GW in 2022. This growth highlights the critical need for precise power capacity estimates to support advanced technologies. Many existing facilities face challenges adapting to new energy and cooling demands, leading to a surge in new construction aimed at meeting these higher performance standards, which directly influences how companies estimate power requirements to stay competitive.

Our Power Capacity Estimation Methodology

This section outlines the methodology employed to estimate the power capacity of data centers across multiple providers and locations, considering various project stages, including announced, active, and under construction.

On our last recent update, we have developed 2 estimation models: one for hyperscalers and one for colocation, enterprise data centers (non-hyperscalers).

Power Capacity Estimation Model for Non-Hyperscalers

Data Compilation

We compiled a comprehensive dataset that includes critical variables affecting power capacity, such as facility size, power consumption, provider details, geo proximity between data centers and operational status. This dataset serves as the foundation for our analysis.

Outlier Detection

Recognizing the significance of accurate data representation, the initial phase of the model focused on identifying potential outliers. Outliers can be defined as data points that exhibit unexpected behavior, deviating from the expected linear relationship between facility size and power capacity. These anomalies may arise from advancements in cooling technologies or server efficiencies, leading to facilities consuming less power relative to their size. With this, we aimed to enhance the robustness of the modeling process.

Insights were gained from identifying potential outliers among data centers that are currently under construction, those that have been announced, and the latest active facilities. These buildings deviate from the expected linear correlation between facility size and power capacity, likely due to the new innovations these companies are implementing to enhance efficiency.

Model Evaluation

Following the outlier detection, we employed a series of regression models to evaluate their performance in predicting power capacity. The primary metrics for model performance were the coefficient of determination (R²) and the Mean Squared Error (MSE), which quantifies the proportion of variance in the dependent variable (power capacity) that can be explained by the independent variables (facility size and other factors). The MSE provides a measure of how much the predictions deviate from the actual values, indicating the model's accuracy.

An ensemble learning model achieving the highest performance, with an R² value of 0.97 and an RMSE of 12.72, was selected for further analysis.

Prediction Generation

Utilizing the selected model, we generated predictions for the power capacity of the remaining data centers. To enhance the interpretability of these predictions, we also established a prediction interval with a confidence level of 80%. This interval provides a range within which we expect the true power capacity of each data center to lie, accounting for the inherent uncertainty in our estimates.

Insights

The following chart illustrates the relationship between the data center size and the power capacity, including our new estimations:

In data centers that are under construction or in the announcement phase, the lines are steeper, suggesting higher power requirements as the facility size increases. This aligns with expectations as newer data centers may have greater demands for power infrastructure due to more extensive and modern equipment being installed. For those in the active phase, the slope is more moderate, reflecting potentially more optimized and stable power usage patterns.

This differentiation makes sense when considering the evolving nature of data centers. Recent advancements in technology, particularly in artificial intelligence (AI) chips and the more efficient distribution of IT workloads, have contributed to an overall trend of achieving more computational power without the need for dramatically increasing physical space. These innovations, combined with power management improvements, are creating more power-efficient data centers, allowing for greater productivity and performance in less space, which is reflected in the tilting trend lines.

Power Capacity Estimation Model for Hyperscalers

This second model focuses exclusively on data centers identified as Hyperscaler — large-scale, efficient facilitities designed to support cloud and computing services for companies like Amazon or Google.

As of July 2, 2025, we have identified the 1,514 buildings, with 47% currently Active:

Data Compilation

Key variables for this model are:

Number of Generators
Project Stage
Facility Size
US State
Provider
Data Center campus vs Data Center building

The number of generators are estimated using satellite pictures (Review Data Collection section for more details).

Outlier Detection

On this scenario, we applied the same tecniques to remove any potential outlier record, minimizing the risk of overfitting or underfitting.

Model Evaluation

We applied the same evaluation metrics (R2 and RMSE) for this estimation model, which quantifies the proportion of variance in the dependent variable (power capacity) that can be explained by the independent variables including the number of generators.

The final ensemble model achieved an R² of 0.94 and an RMSE of 10.0182.

Prediction practices are consistent with those used in the non-hyperscaler model.

Insights

In previous releases, we used the same estimation model for both non-hyperscalers and hyperscalers. This new model improved accuracy specially for major providers like AWS, Google, Microsoft and Meta. The estimated power capacity has increased to reflect recent announcements and deeper analysis of generator requirements, aligning better with current industry trends.

PreviousData Center Construction stages methodology NextData Quality

Last updated 1 month ago