ai-chem-2.png

Atmospheric aerosols (tiny particles in the air) are among the least understood components of Earth system models (ESMs). They contribute substantially to aerosol radiative forcing (the amount of sunlight the Earth absorbs) and strongly influence how air quality impacts human health. One common type of aerosol, called secondary organic aerosol, forms through extremely complex chemistry, which realistically involves hundreds of thousands of chemical reactions and species. Including all of these chemistry parameters in 3D models is computationally prohibitive.

To bridge this gap, our team at NSF NCAR experimented with two machine-learning approaches trained on data generated by GECKO-A. This detailed chemistry model simulates all known reactions involved in aerosol formation. First, we demonstrated that multi-layer perceptron and gated recurrent unit (GRU) neural networks can emulate organic aerosol formation for common compounds, such as toluene, dodecane, and α-pinene. The models had small errors (only 2–8%) and computational speed up to five orders of magnitude faster than GECKO-A, with the GRU's hidden-state memory providing superior long-term numerical stability (Schreck et al., 2022). Having this capability now opens the door to embedding fully explicit organic aerosol chemistry directly into Earth system models, which is critical for accurately predicting organic aerosol burdens and their effects on radiative forcing and air quality.

A second study, Mouchel-Vallon & Hodzic (2023), showed that random forest emulators achieve similarly strong performance. Critically, they identified that training a model on a broad sampling of chemical regimes is essential to ensure reliable performance across the diversity of lower-atmospheric (tropospheric) conditions. Together, these results establish that ML can faithfully capture the molecular-level complexity of organic chemistry while remaining fast enough to be compatible with large-scale Earth system simulations.

Building on these proof-of-concept results, our team has been awarded a UCAR President's Innovation Seed Fund grant (2025 AI-Chem Initiative) to coordinate an international community initiative, spanning NSF NCAR, Barcelona Supercomputer Center, Deutsches Zentrum für Luft- und Raumfahrt (DLR), and Cambridge University, focused on extending our work to the MOZART chemistry mechanisms used in Community Earth System Model (CESM) and Model for Prediction Across Scales (MPAS), establishing unified box-model testbeds, generating community training datasets, and ultimately embedding AI chemistry emulators into next-generation ESMs such as CESM-MLe and MPAS-LES.

For more information or for partnership opportunities, please contact Alma Roux.

ai-chem-1.png

Figure: Overview of the UCAR President's Fund, 2025 AI-Chem Initiative, structured around interconnected activities: building community chemistry datasets, developing a unified box-model testbed to generate synthetic ML training data for standard chemical mechanisms in alignment with the CESM-MLe efforts; extending the NSF NCAR CREDIT foundational AI framework to atmospheric chemistry, and building a sustained global community.