WindAI: A Deep Learning Approach to Global Wind Resource Assessment Using Multi-Source Reanalysis Data

Authors: WindAI.techDate: March 2026Version: 1.0

Abstract

Accurate wind resource assessment is a prerequisite for wind energy project development, yet conventional methods remain costly, time-consuming, and geographically constrained. This paper presents WindAI, a deep learning system that predicts hourly wind farm capacity factors for any location on Earth using freely available meteorological reanalysis data. The model is a deep neural network with multiple hidden layers, batch normalization, and regularization, trained on over 10 million hourly generation-weather observation pairs from 300+ wind farms across eight countries: Australia, the United Kingdom, Belgium, Denmark, Canada (Ontario), the United States (Texas), New Zealand, and Brazil. Input features are drawn from four independent data sources — ERA5 reanalysis, MERRA2 reanalysis, ERA5 static fields, and Copernicus DEM elevation data — sampled at multiple spatial grid points surrounding each site, yielding 400+ features per observation. Evaluated on six geographically and technologically diverse held-out wind farms never seen during training, WindAI achieves an hourly root mean square error (RMSE) of 0.147 and a coefficient of determination (R²) of 0.777. When aggregated to annual mean capacity factors, prediction errors range from 2.1% to 7.8% across the test plants. The system provides predictions in minutes at negligible marginal cost, compared to weeks and tens of thousands of dollars for traditional consultant-led assessments.

1. Introduction

1.1 The Problem

Wind energy is one of the fastest-growing sources of electricity worldwide, yet the fundamental bottleneck in wind project development remains the same as it was decades ago: determining whether a given site has enough wind to justify construction. Traditional wind resource assessments (WRAs) are expensive, slow, and inherently local. A preliminary assessment from a specialized consultant typically costs $8,000 to $9,000 and takes two to four weeks. A bankable WRA — the level of analysis required to secure project financing — can cost $15,000 to $50,000 or more and take four to twelve weeks, often requiring the installation and operation of on-site meteorological masts for one or more years.

Physics-based wind resource modeling tools such as WAsP (developed by DTU Wind Energy) and DNV's WindFarmer provide high-fidelity predictions but require detailed site-specific inputs, expert calibration, and per-site software licenses. A WAsP license costs approximately €2,100, while DNV WindFarmer carries an annual fee of approximately €5,639.

1.2 The Opportunity

A convergence of three developments creates an opportunity to close this gap. First, decades of high-quality hourly weather reanalysis data are now freely available through ERA5 and MERRA2. Second, growing volumes of publicly disclosed wind farm generation data provide ground-truth observations. Third, advances in deep learning enable models that learn the complex, nonlinear mapping between gridded weather variables and actual power output.

1.3 Our Approach

WindAI takes a data-driven approach. Rather than modeling the physics of atmospheric flow and turbine aerodynamics from first principles, we train a single neural network on the joint distribution of weather conditions and observed power output across hundreds of wind farms spanning diverse geographies, climates, turbine technologies, and terrain types. The model ingests 400+ features per hourly observation and outputs a scalar capacity factor. The system is deployed as a REST API that accepts latitude, longitude, and turbine specifications, fetches the relevant reanalysis data on demand, and returns hourly capacity factor predictions, annual energy production estimates, and statistical summaries — all within minutes.

2. Data Sources

WindAI integrates data from six distinct sources, each contributing different aspects of the information needed to predict wind farm power output.

2.1 ERA5 Reanalysis

ERA5 is the fifth-generation atmospheric reanalysis produced by ECMWF. It provides hourly estimates of atmospheric variables on a global grid at 0.25-degree spatial resolution. Six ERA5 variables are extracted at 16 grid points per site (6 × 16 = 96 features):

Variable	Description
u100	Eastward wind at 100 metres
v100	Northward wind at 100 metres
u10	Eastward wind at 10 metres
v10	Northward wind at 10 metres
t2m	Air temperature at 2 metres
sp	Surface air pressure

2.2 MERRA2 Reanalysis

MERRA2 is produced by NASA at 0.5° × 0.625° resolution. Including MERRA2 alongside ERA5 provides an independent estimate of atmospheric conditions. Two MERRA2 wind variables (U50M, V50M) are extracted at 16 grid points, yielding 32 features.

2.3 ERA5 Boundary Layer Height

The planetary boundary layer height (BLH) serves as a proxy for atmospheric stability. It is extracted at 16 grid points, contributing 16 features.

2.4 ERA5 Static and Invariant Fields

Time-invariant fields describing terrain and surface characteristics: geopotential at surface, land-sea mask, standard deviation of orography, slope, anisotropy, and angle of sub-grid orography. Extracted at 16 grid points (6 × 16 = 96 features).

2.5 Copernicus Digital Elevation Model

The Copernicus GLO-30 DEM provides global terrain elevation at 30-metre resolution. Ten summary statistics are computed per site (min, p20, p50, p80, max, std, mean, range, slope_mean, slope_std).

2.6 Spatial Sampling: The 16-Point Grid

Rather than extracting weather data at a single grid point, WindAI samples at 16 points in a 4×4 grid surrounding each site. This captures spatial gradients in wind speed, pressure, and temperature. Points are enumerated in a perimeter-spiral order:

 1  2  3  4
12 13 14  5
11 16 15  6
10  9  8  7

2.7 Wind Farm Generation Data

Hourly generation records from wind farms in eight countries, sourced from grid operators and regulatory bodies (AEMO, ENTSO-E, IESO, ERCOT, ONS). In total, approximately 10.5 million hourly observations across 300+ wind farms spanning 2006 to 2020.

Country	Data Source
Australia (South)	AEMO dispatch SCADA
Australia (West)	SW Australia facility data
Brazil	ONS hourly generation
United Kingdom	ENTSO-E
Belgium	ENTSO-E
Denmark	ENTSO-E
Canada (Ontario)	IESO
USA (Texas)	ERCOT

3. Model Architecture

3.1 Network Design

WindAI employs a multi-layer deep neural network with 400+ input features. The architecture uses batch normalization to handle heterogeneous input scales, dropout regularization to improve generalization, and a funnel-shaped design that progressively compresses representations from high-dimensional input to a scalar capacity factor output.

3.2 Design Rationale

Batch Normalization stabilizes training by normalizing internal activations across heterogeneous input scales (wind speed in m/s, pressure in Pascals, temperature in Kelvin). Dropout regularization reduces co-adaptation and improves generalization to unseen locations. The funnel architecture forces progressively compressed representations, distilling hundreds of raw and physics-derived features into a single prediction.

3.3 Feature Categories

The model combines raw meteorological variables with physics-derived features including wind speed, wind shear exponents, wind direction components, and air density. These are computed from the underlying reanalysis data at multiple spatial grid points surrounding each site.

3.4 Feature Inventory (400+ total)

Category	Description
Plant attributes	Hub height, turbine count, rated power, rotor diameter, etc.
Spatial distances	Distance from each grid point to plant location
ERA5 meteorological	Wind, temperature, and pressure variables at multiple grid points
ERA5 boundary layer	Boundary layer height at multiple grid points
MERRA2 wind	Independent wind estimates at multiple grid points
ERA5 static fields	Invariant terrain and surface fields at multiple grid points
Elevation	Terrain statistics from Copernicus DEM
Temporal encoding	Hour-of-day and month-of-year
Derived physics	Wind speed, shear, direction, air density

4. Training

4.1 Data Split

The dataset is split by plant identity rather than by random sampling. Six wind farms are held out entirely for evaluation. This plant-level holdout ensures the model is evaluated on its ability to generalize to completely unseen locations and turbine configurations.

Plant	Country	Type	Turbines	Rated Power (kW)
Albany Grasmere	Australia	Onshore	6	2,300
Amazon Wind Farm TX	USA (Texas)	Onshore	110	2,300
Belwind I	Belgium	Offshore	55	3,000
Bobcat Bluff TX	USA (Texas)	Onshore	100	1,500
Comber	Canada (Ontario)	Onshore	72	2,300
Kingsbridge I	Canada (Ontario)	Onshore	22	1,800

4.2 Optimization

Parameter	Value
Optimizer	AdamW (weight_decay = 1e-4)
Learning rate schedule	OneCycleLR (0.001 → 0.005)
Batch size	8,192
Epochs	50
Loss function	Mean Squared Error (MSE)

The model is implemented in PyTorch. The full training run completes in approximately 4 minutes on an NVIDIA A10G GPU. The model weights and normalization statistics are exported to a portable NumPy archive (~1.6 MB).

5. Results

5.1 Overall Performance

Metric	Value
RMSE	0.147
MAE	0.100
R²	0.777

An hourly RMSE of 0.147 capacity factor units means that, on average, hourly predictions deviate from actuals by approximately 15 percentage points of installed capacity. The practical relevance lies in aggregation to monthly and annual scales, where random hourly fluctuations cancel out.

5.2 Per-Plant Results

Plant	Country	Actual CF	Predicted CF	Relative Error
Albany Grasmere	Australia	23.7%	23.2%	2.1%
Amazon Wind Farm TX	USA	44.3%	42.3%	4.5%
Belwind I (offshore)	Belgium	35.9%	37.7%	5.0%
Bobcat Bluff TX	USA	33.6%	36.1%	7.4%
Comber	Canada	29.0%	28.2%	2.8%
Kingsbridge I	Canada	30.6%	28.2%	7.8%

The mean absolute relative error across these plants is 5.0%. Four of the six plants are predicted within 5% of their actual annual capacity factor; all six are within 8%.

5.3 Temporal Aggregation Effects

Prediction accuracy improves substantially with temporal aggregation. While hourly RMSE is ~0.147, monthly errors are typically 2-5 percentage points, and annual errors range from 2-8%. For the primary use case of estimating annual energy production, the relevant metric is annual accuracy.

6. Comparison with Alternative Approaches

Characteristic	WindAI	WAsP	WindFarmer	Consultant WRA
Cost per site	$49.99	€2,100 (license)	€5,639/year	$8,000–50,000+
Time per site	2–5 minutes	Days	Days–weeks	2–12 weeks
Calibration required	No	Yes	Yes	Yes (met mast)
Wake modeling	Implicit (learned)	Explicit	Explicit	Explicit
Global coverage	Yes	Requires local data	Requires local data	Per-site
Temporal resolution	Hourly	Statistical	Statistical	Statistical

7. Limitations and Future Work

7.1 Current Limitations

Reanalysis resolution: ERA5's 0.25-degree resolution (~28 km) means terrain features smaller than this scale are not explicitly resolved. Sites in exceptionally complex terrain may exhibit larger prediction errors.
No site-specific calibration: The model does not incorporate site-specific measurement data. It cannot capture local channeling effects or unusual turbulence regimes.
No explicit wake modeling: Wake effects are learned implicitly from aggregate farm-level data but cannot be modeled for different turbine layouts.
Geographic training bias: Training data is concentrated in temperate/subtropical climates. Performance in tropical or extreme-latitude regions may be less reliable.

7.2 Future Work

Higher-resolution data: ERA5-Land at 0.1-degree resolution (~11 km) for improved predictions in complex terrain.
Temporal lag features: Incorporating lagged features and rolling statistics for improved hourly accuracy.
Transfer learning: Fine-tuning on site-specific SCADA data for sites with short-term measurements.
Uncertainty quantification: Monte Carlo dropout or quantile regression for prediction intervals.
Expanded training data: Northern Europe, continental Europe, and Asia for improved generalization.

8. Conclusion

WindAI demonstrates that a single, globally trained deep learning model can produce useful wind resource assessments for diverse locations worldwide, using only freely available reanalysis data and basic turbine specifications as inputs. Trained on over 10 million hourly observations from 300+ wind farms across eight countries, the model achieves an hourly RMSE of 0.147 and R² of 0.777 on six held-out test plants spanning four countries. Annual capacity factor predictions fall within 2–8% of observed values.

The model's practical value lies in its ability to provide rapid, low-cost pre-feasibility assessments at scale. Whereas traditional wind resource assessments cost $8,000 to $50,000 and require weeks to months, WindAI delivers results in minutes at a fraction of the cost. This enables developers to screen large portfolios of candidate sites efficiently, focusing detailed assessment resources on the most promising locations.