Ingest observations from NNJA (via Brightband) and convert to DART observation sequence format
Summary
This project will ingest observation data from the Brightband nnja-ai API (the AI-ready NOAA-NASA Joint Archive, NNJA) and convert it into the DART observation sequence (obs_seq) format. The goal is to enable direct assimilation of NNJA observations in DART-based workflows, bridging modern observational archives and existing data assimilation systems.
Motivation
- The nnja-ai dataset provides a modern, well-structured, cloud-native observational archive (in Parquet / tabular format) for a wide range of sensors (satellites, radiosondes, surface stations, etc.)
- DART requires observations in its
obs_seq structure (with associated metadata, error specifications, and observation types) to perform assimilation.
- By building a conversion pipeline, we unlock the potential of NNJA observations for assimilation experiments, operational workflows, and hybrid AI–DA systems.
- This also helps users avoid manual, ad-hoc conversions and ensures consistency, traceability, and robustness in data handling.
Goals
- Develop a conversion tool that queries or ingests NNJA data from the Brightband nnja-ai API.
- Map NNJA variables, sensor identifiers, timestamps, locations, and metadata to DART observation definitions (
obs_def).
- Generate valid DART
obs_seq files from the ingested data.
- Validate output by testing small examples using the DART obs_sequence_tool
- Provide documentation and example notebooks demonstrating conversion workflows.
- (Optional) Automate periodic ingestion / updates so new NNJA observations can be converted on demand.
Approach / Methodology
- Familiarize yourself with the nnja-ai API / SDK
- Use the Brightband
nnja-ai SDK or API to query or download observations in a programmatic way.
- Explore the data schemas, partitioning (e.g. date, sensor type), and how to filter for desired subsets.
- Define mapping between NNJA observation schema and DART observation definitions
- Determine how NNJA field names (e.g. sensor, variable, quality flags, geolocation) map to DART’s
obs_type, obs_error, obs_kind, etc.
- Handle sensor-specific nuances (e.g. satellite radiances vs in-situ data).
- Build conversion routines
- Read NNJA data into a notebook
- Apply filters, quality control, and coordinate/time transformations (if needed).
- Create DART-compatible data structures and metadata.
- Write out
obs_seq files
- Testing & validation
- Use small subsets of NNJA data to test conversions.
- Run DART observation tool to confirm DART can read the resulting observation seqeunces.
- Compare statistics (observation count, error distributions) before and after conversion.
- Documentation and automation
**Skills Needed or to be gained **
- Python programming (file I/O, data processing)
- Experience with data handling libraries (Pandas, PyArrow, Dask, xarray)
- Familiarity with Parquet, columnar data formats, and large-volume data reading
- Understanding of DART observation sequence format,
obs_def, obs_seq conventions
- Some knowledge of remote sensing / satellite observation metadata if working with radiance data
- Comfort with time coordinate systems, geospatial transforms, and quality flags
Possible Challenges & Open Questions
- Some observations may lack full metadata (e.g. sensor angles, calibration) needed by DART.
- Time zone, time reference, or timestamp precision mismatches between NNJA and DART.
- Ensuring that coordinate systems align (e.g. lat/lon grids, altitude levels).
- Performance issues when converting large volumes of data (memory, I/O).
- Consistency with DART observation error and quality control expectations.
- Handling edge cases — missing data, sensor blacklisting, quality flags, or observation duplicates.
- Maintaining compatibility as the nnja-ai schema evolves or updates (versioning).
References
Ingest observations from NNJA (via Brightband) and convert to DART observation sequence format
Summary
This project will ingest observation data from the Brightband nnja-ai API (the AI-ready NOAA-NASA Joint Archive, NNJA) and convert it into the DART observation sequence (
obs_seq) format. The goal is to enable direct assimilation of NNJA observations in DART-based workflows, bridging modern observational archives and existing data assimilation systems.Motivation
obs_seqstructure (with associated metadata, error specifications, and observation types) to perform assimilation.Goals
obs_def).obs_seqfiles from the ingested data.Approach / Methodology
nnja-aiSDK or API to query or download observations in a programmatic way.obs_type,obs_error,obs_kind, etc.obs_seqfiles**Skills Needed or to be gained **
obs_def,obs_seqconventionsPossible Challenges & Open Questions
References