Aircraft-aware clear-air turbulence prediction with XGBoost on PIREPs and ERA5

kadir_gokdeniz_1 pts0 comments

Predicting The Invisible: An ML Model For the Turbulence that Causes 71% of Weather-Related Flight Accidents. | by KadirGokdeniz | Apr, 2026 | Towards AISitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

Towards AI

We build Enterprise AI. We teach what we learn. Join 100K+ AI practitioners on Towards AI Academy. Free: 6-day Agentic AI Engineering Email Guide: https://email-course.towardsai.net/

Predicting The Invisible: An ML Model For the Turbulence that Causes 71% of Weather-Related Flight Accidents.

KadirGokdeniz

5 min read·<br>Apr 19, 2026

Listen

Share

Press enter or click to view image in full size

Spatial distribution map of CAT reports across US airspaceThe Problem No One Can See<br>At 40,000 feet, a plane suddenly drops. Passengers slam into the ceiling. Drinks fly across the cabin. There was no warning, the sky was perfectly clear.<br>This is the story of how I built a system that predicts this — and why matching meteorology to aircraft aerodynamics was the key insight.<br>This is Clear Air Turbulence (CAT). Unlike storm-related turbulence, CAT is completely invisible. Pilots can’t see it. Radar can’t detect it. It strikes without warning, and turbulence responsible for 71% of all weather-related aviation accidents. In the US alone, it costs the industry $100–200 million per year in injuries, diversions, and delays.<br>And it’s getting worse. Climate change is destabilizing jet streams, making CAT more frequent and more intense every year.<br>Why I Took This On<br>In 2024, I was selected for SAYZEK, Turkey’s National Defense AI Program run by the Presidency of Defence Industries. The program pairs university researchers with real defense and aviation challenges. My assignment: environmental situational awareness; specifically, predicting atmospheric hazards that are invisible to conventional systems.<br>I chose to focus on Clear Air Turbulence because existing approaches had a fundamental gap. Traditional methods treat turbulence prediction as a binary problem: turbulence exists, or it doesn’t. But a Boeing 737 and an Airbus A380 experience the same atmospheric conditions very differently. A pocket of air that barely registers on a wide-body aircraft could throw a regional jet sideways. Nobody was modeling this.<br>My question was: what if we could predict not just whether turbulence exists, but how severely a specific aircraft type would feel it?<br>What I Built<br>I designed a machine learning pipeline that integrates three data sources that had never been combined for this purpose:<br>Pilot Reports (PIREPs) — 19,213 CAT events reported across US airspace between 2022– 2024, sourced from the Iowa Environmental Mesonet. Each report includes location, altitude, aircraft type, and perceived turbulence severity.<br>ERA5 Reanalysis Data — Meteorological variables from ECMWF at 0.25° spatial resolution and 1-hour temporal resolution. I engineered 14 turbulence diagnostic features from this data, including the Richardson number, vertical wind shear, and the TI3 instability index.<br>BADA Aircraft Database — Aerodynamic parameters from EUROCONTROL for each aircraft type: wing loading, drag coefficients, lift-to-drag ratios, aspect ratios. This was the novel ingredient — matching each pilot report to the specific aerodynamic profile of the aircraft that filed it.<br>The final dataset: 38,426 balanced samples across five US pressure levels (200–350 hPa), covering cruise altitudes where commercial aircraft actually fly.<br>Press enter or click to view image in full size

Data pipeline diagram showing three data sources feeding XGBoost modelThe Hard Parts<br>Data alignment was the first bottleneck. Pilot reports come with imprecise timestamps and coordinates. ERA5 data sits on a fixed grid. Matching a pilot’s “somewhere over Oklahoma at around 2pm” to exact meteorological conditions at that point in the atmosphere required careful spatiotemporal interpolation across five pressure levels.<br>Get KadirGokdeniz’s stories in your inbox

Join Medium for free to get updates from this writer.

Subscribe

Subscribe

Remember me for faster sign in

Feature engineering required domain knowledge, not just ML intuition. I couldn’t just throw raw weather variables at a model. I computed 14 physics-based turbulence diagnostics — each grounded in atmospheric science literature — because the relationship between raw variables and turbulence is highly non-linear. A temperature gradient alone means little; the Richardson number (which combines thermal stability with wind shear) tells you whether the atmosphere is about to break apart.<br>The aerodynamic integration was an open question. No one had published results on whether aircraft-specific features actually improve CAT prediction. I ran six parallel experiment scenarios — with and without aerodynamic features, across three turbulence severity groupings — to rigorously test whether this approach added real value or just noise.<br>Results<br>I benchmarked five algorithms: XGBoost, LightGBM,...

turbulence aircraft data across from clear

Related Articles