GDEX
Geoscience Data Exchange (GDEX)
The Geoscience Data Exchange (GDEX) is NSF NCAR’s integrated research data commons for the research community - a modern platform for storing, sharing, and analyzing Earth system science data in one place. By making large scientific datasets findable, accessible, and usable, GDEX reduces technical barriers and enables researchers to run advanced analyses, including AI and machine learning, directly alongside the data. GDEX allows researchers to work with data where it lives, reducing the need to download or duplicate large volumes of data.
Designed with both accessibility and scale in mind, GDEX integrates with NSF NCAR’s community high-performance computing resources, allowing researchers to collaborate, stream AI-Ready Data, and iterate quickly without time-consuming download workflows.
The Challenge: Fragmented and Difficult-to-Use Data Services
Historically, Earth system science data have been distributed across discipline-specific repositories, each with its own interfaces, formats, and access methods. This fragmentation created inconsistent user experiences and required researchers to navigate multiple systems before they could even begin analysis. Many datasets were stored in formats that were not optimized for scalable AI or machine learning workflows, requiring substantial preprocessing before they could be used effectively.
The traditional download–clean–analyze cycle was both time-consuming and resource-intensive. Institutions without significant local storage or compute capacity faced additional barriers, limiting who could fully participate in large-scale data-driven research. Integration with NSF NCAR’s community computing services was often limited, further slowing the transition from data discovery to analysis and insight.
The Solution: A Unified, AI-Optimized Data Commons
GDEX addresses these challenges by applying user-centered design and Findable, Accessible, Interoperable, and Reusable (FAIR) principles to streamline data discovery and analysis workflows.
Selected datasets are transformed into chunked, cloud-native formats with structured metadata to support scalable, programmatic access. Researchers can launch analyses directly within NSF NCAR computing environments, eliminating redundant data transfers. Through the Open Science Data Federation (OSDF), GDEX enables global data streaming, allowing researchers to work with large datasets remotely and efficiently. API-enabled access and server-side data subsetting further streamline workflows, allowing scientists to retrieve only the data they need and accelerate experimental cycles in notebook-based or programmatic environments.
This integrated approach supports reproducible research and rapid AI/ML experimentation at scale.
How We Prepare Data
By leveraging FAIR data principles, we design curated data pipelines that support modern AI research workflows.
- Convert selected datasets into analysis-optimized, cloud-friendly formats
- Automate metadata gathering to support data search, discovery, and subsetting
- Enable streaming and subsetting to reduce the data volume transferred
- Host observational and model data to enable end-to-end AI workflows
These capabilities lower barriers to entry while supporting advanced users working at scale.
Learn more about AI-Ready data.
The Impact: Accelerating Discovery Through Data-Proximate Compute
GDEX transforms how Earth system scientists engage with data. Researchers can directly read datasets from NSF NCAR’s community computing resources, stream data remotely, and leverage notebook examples that demonstrate best practices for analysis and model development. AI-ready datasets, including reanalyses, CESM outputs, and observational records, are prepared to reduce preprocessing time and lower the barrier to entry for machine learning applications.
Dedicated Data Analysis Allocations further expand access for U.S. university researchers, helping ensure that large-scale data and compute resources are not limited to institutions with significant local infrastructure.
By bringing compute to the data and optimizing datasets for modern workflows, GDEX enables faster experimentation, broader participation, and more equitable access to large-scale scientific discovery.
Resources:
- GDEX Home
- GDEX by-the-Numbers
- Access datasets through GDEX
- Contribute datasets to the research data commons
- Collaborate on AI-ready dataset development and pipelines
- Contribute your AI scientific workflows
For general inquiries and data support questions, please contact the NSF NCAR Research Data Help Desk at datahelp@ucar.edu or go through the NSF NCAR Geoscience Data Exchange Help Desk.



