How does NSF NCAR commit to Responsible and Reliable Innovation?

NSF NCAR develops and applies AI and machine learning in responsible, transparent, and scientifically rigorous ways. Our approach pairs innovation with accountability to ensure new capabilities strengthen research quality, reliability, and public value.

Responsible AI in Earth system research requires more than technical solutions. It depends on ongoing dialogue with researchers and stakeholders, shared standards and best practices, cross-institutional partnerships, and continuous learning as methods and needs evolve. This approach encourages the community to feel connected and committed to responsible development.

Through this work, NSF NCAR supports the academic research community and helps connect cutting-edge research to real-world decision-making.

How NSF NCAR Plans to Implement Responsible and Reliable AI:

Priority 1: Rigorous Data Stewardship and Fitness-for-Purpose

Establish comprehensive frameworks to ensure AI models are trained on high-quality, well-documented, and appropriately representative datasets. This includes implementing data quality and provenance standards, full data-lifecycle stewardship, systematic detection and documentation of biases, developing fitness-for-purpose tools to match datasets to appropriate applications, and providing clear guidance on data limitations and use boundaries.

See how we engage in rigorous data stewardship and our standards and commitments to data quality here.

Priority 2: Reproducible, Reliable, and Responsible Workflows

Design infrastructure and standards that make responsible AI development the easiest path for scientists to follow. This includes building automated checks and guardrails that guide responsible development and prevent misuse, establishing reproducibility frameworks with version control, implementing continuous testing and validation, and providing clear guidance on the appropriate use of models and tools.

Priority 3: Comprehensive Model Evaluation

Develop holistic evaluation frameworks that assess AI models across multiple dimensions beyond traditional accuracy metrics. This includes evaluating physical plausibility and consistency with established theory, assessing interpretability and scientific explainability, testing out-of-sample performance, benchmarking against diverse conditions, and documenting model capabilities and limitations transparently.

Learn more about our approaches and commitments to benchmarking and evaluating in AI for ESS.

Priority 4: Scientific Foundations for Confidence and Reliability

Establish rigorous scientific grounds for when and why we should have confidence in AI model outputs for different applications. This includes developing frameworks that connect model performance to fitness for specific scientific uses, creating application-specific validation protocols, establishing confidence assessment methods appropriate for different use cases, and building an understanding of when AI results demonstrate reliability for actionable contexts.

Explore how NSF NCAR researchers are building our understanding of what makes AI trustworthy in weather modeling.

Priority 5: Systematic Uncertainty Quantification and Communication

Create structures and tools to quantify, assess, and communicate uncertainty, error, and bias throughout AI pipelines. This includes developing uncertainty-quantification methods appropriate for AI approaches, implementing bias-tracking and propagation analysis, creating clear communication frameworks for different audiences, and establishing standards for reporting model uncertainties, limitations, and risks.

Priority 6: Holistic Risk Assessment 

Develop frameworks to identify and assess potential risks related to scientific uncertainty and error, as well as ethical considerations, at critical points in AI development and deployment. This includes systematic risk identification protocols, impact assessment for potential misuse or misapplication, evaluation of ethical implications of model and data use, and development of mitigation strategies for identified risks.

Priority 7: Open Science and Ethical Dissemination

Ensure that AI tools, training and evaluation data, and outputs are shared ethically and openly with the research community and society. This includes implementing and evolving open-access standards for AI models and datasets, providing comprehensive documentation and use guidance, creating mechanisms for community feedback and improvement, and developing governance structures for the responsible sharing of sensitive applications.

ai-responsible-reliable-priorities.png

Figure: Comparison of gross primary productivity between observed reference data (GBAF) and model output (CLM4.5). (a) Reference data - shows the observed values, with grey areas indicating there is no data for Antarctica, Greenland, and parts of the Sahara. (b) Model data fills in predictions over those grey areas to cover all land areas. (c) Bias map - compares the two datasets (red = model overestimates; blue = model underestimates) Only areas where the two datasets agree are compared, so the grey remains. (d) Bias scores compare the areas where both datasets define land (green = good model performance; red = poor model performance). From: Collier et al. (2018)