Skip to content

Conversation

@mgharamti
Copy link
Contributor

Description:

This PR adds two new in-situ ocean converters (ARVOR profiling floats and SVP surface drifters). It also introduces a reusable CSV parsing utility in parse_args_mod. Both converters make use of this CSV interface, which simplifies code. In addition, documentation has been added to the converters.

The CSV parsing utilities build on already existing parsing infrastructure (like a wrapper). The functionality mimics our netcdf handling in the sense that a file is opened, and data is accessed with a single call before closing the file. A few helper functions have been also added. These can be used to access the header, to inquire if a field exists, to find the dimensions, etc.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Tested both converters using actual raw ASCII data files.

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

ARVOR: /glade/derecho/scratch/gharamti/inacawo/DART/observations/obs_converters/ARVOR/work/obs_files.txt
SVP: /glade/derecho/scratch/gharamti/inacawo/DART/observations/obs_converters/SVP/work/obs_files.txt

@mgharamti mgharamti added Enhancement New feature or request obs_converters converting observations to DART format labels Nov 25, 2025
@nancycollins
Copy link
Collaborator

moha - i'm going to file a review on the code in just a bit, but up front i wanted to say that it's great to pull out the CSV parsing into a module so it can be reused, tested and updated independently of the calling code.

if you were willing to do a bit more work on this, i think that the CSV routines are self-contained enough to merit their own separate module. they can call code from the parse module, but i think they're different enough to stand alone. let me know what you think about this. i'll put other more specific comments into my review.

also - do you have any tests you used on this code that could be added to the repo?

Copy link
Collaborator

@nancycollins nancycollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the converters themselves are easy to read and understand, which is good. i had a few comments - the biggest one is probably moving the csv routines to their own module.

cf%delim = detect_delim(line)

call split_fields(line, cf%delim, cf%ncols, cf%fields)
call close_file(iunit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would leave the file open here in csv_open(), leave it open in all subsequent calls, and close it in csv_close(). you can add the iunit to the same structure and reuse it until close is called. you can call "rewind()" if you need to start reading at the beginning of the file in subsequent calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

cf%ncols = 0
cf%delim = ','
cf%fields = ''
cf%is_open = .false.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you add iunit to the cf structure, close cf%iunit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

file_out = 'obs_seq.arvor',
obs_error_temp = 0.02, ! temperature error standard deviation (C)
obs_error_sal = 0.02, ! salinity error standard deviation (PSU)
avg_obs_per_file = 500000, ! pre-allocation hint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd say this is more than a 'hint' because i don't see anywhere that the converter can recover if there are more obs than were originally allocated for. maybe use 'limit' instead of 'hint'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* - ``avg_obs_per_file``
- integer
- ``500000``
- Estimate of valid obs per file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a second sentence something like 'Used for pre-allocation. Number of files times this number must be larger than the total number of output observations.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


! Open csv file and get dims
call csv_open(filename, cf, routine)
nobs = cf%nrows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto the comment about an accessor function here. i think this is the only one missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mgharamti
Copy link
Contributor Author

Nancy, thanks for the review. I should be able to address all of the comments. I'll also move the routines to their own module as suggested. The data I used for testing can be found here:
For ARVOR:

  • /glade/work/gharamti/inacawo/data_snippets/arvorc/
  • /glade/work/gharamti/inacawo/data_snippets/arvori/
    For SVP:
  • /glade/work/gharamti/inacawo/data_snippets/svp/20251006/

These are the same as those listed in ops_files.txt (in my PR description). Do you want me to add some of those ascii files to the repo?

@nancycollins
Copy link
Collaborator

hi moha - thanks. no, i don't think they need to be added to the repo. i just wanted to see some of the input files so maybe i could make a couple of simple test programs that mimic what the read routines are expected to parse.

@nancycollins
Copy link
Collaborator

i made a small test program and pushed it to my fork of your code here:

https://github.com/nancycollins/moha/tree/insitu_ocean_converters/developer_tests/utilities

it's called csv_read_test.f90 (and a corresponding update to work/quickbuild.sh). i think it should be added to your pull request but i'm rusty with github so i left it there in my repo. it works fine in a couple test cases but the csv field read code doesn't cope correctly with embedded blanks in data fields (test 3 fails).

@nancycollins
Copy link
Collaborator

i went back to the parse_args_mod and made a new routine get_csv_words_from_string() which must be passed a string and a delimiter character, and it returns a word count and word array. it handles embedded blanks and quoted fields so they can contain the delimiter character inside the field. i pushed this to my fork and also added a parse_csv_test.f90 and parse_csv_test.in test for this.

@hkershaw-brown
Copy link
Member

Nancy's pull request to Moha's pull request is here:
mgharamti#1

@mgharamti
Copy link
Contributor Author

Ran the converters with the recent changes and everything worked as intended. I also added documentation of the new module.
Happy to address any remaining issues.

Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Moha,

Looking good, I put a comment in on reverting parse_args_mod, and I couple of comments in the doc. Then main one is people should know about the \escape.

I'll test the build and run Nancy's tests next.

Cheers,
Helne

Comment on lines 20 to 26
Other modules used
------------------

::

types_mod
utilities_mod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question on this, do you find it helpful for the docs to list the other modules used?

I feel like this gets added to documentation because people have added it to documentation previously. I'd remove it unless you think it does help people reading the documentation.
I never trust this to be up-to-date and would look at the code to check the module usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not, not really very helpful. As you mentioned, I put it in to mimic documentation of other modules. I'll remove.

Comment on lines 94 to 95
rules. This routine is exposed primarily to support consistent parsing behavior
in other code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_csv_words_from_string is exposed for the test programs included in this pull request

Suggested change
rules. This routine is exposed primarily to support consistent parsing behavior
in other code.
rules.

convert_goes_ABI_L1b
MOD29E1D_to_obs
hf_to_obs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new executables to .gitignore

Suggested change
arvor_to_obs
svp_to_obs

@mgharamti mgharamti force-pushed the insitu_ocean_converters branch from b6a7683 to b03bb73 Compare January 7, 2026 20:07
@hkershaw-brown
Copy link
Member

hkershaw-brown commented Jan 7, 2026 via email

@hkershaw-brown
Copy link
Member

This is out of date with NCAR main, the info is at the bottom of the pull request:
Screenshot 2026-01-07 at 3 14 03 PM

This update introduces a new set of general-purpose CSV utilities
to `parse_args_mod` for use across DART observation converters and
other modules that ingest ASCII/tabular data.

New utilities added:
- `csv_file_type`: cached CSV handle storing filename, nrows, ncols, delimiter, and header fields.
- `csv_open`/`csv_close`: initialize/reset CSV handle and preload header/dimensions.
- `csv_get_field_char`
- `csv_get_field_int`
- `csv_get_field_real`
  Unified interface through `csv_get_field` for retrieving column strings, integers, or reals.
- Normalization of delimiters (, or ;) with support for empty fields.
- `csv_get_obs_num`: count data rows (excluding header)
- `csv_find_field`: header lookup
- Other internal helpers such as `split_fields`, `detect_delim`, `normalize_delims`

These routines provide a reusable framework that is modeled after our existing
NetCDF utilities.
A new ocean converter that uses profiling floats.
The converter harvests temperature and salinity data
at different depths and time. Depths are converted
from pressure in dbar to height in meters.

The converter uses the csv parsing utilities
to read data from the raw input files.
This is an ocean conveter that uses surface drifters.
It collects SST and surface currents data. It uses
the csv parsing utilities to read the incoming ASCII files.
- `csv_get_field_index`: Get column index of a field
- `csv_field_exists`: Check if field exists in file
- `csv_print_header`: print the field names (my favorite)

Additional debugging statements in the converters
nancycollins and others added 19 commits January 7, 2026 13:31
it must be told what the delimiter is (generally comma or semicolon)
and splits up the fields based on the delimiter.  it handles quotes
inside the fields to allow the delimiter to be part of the string.

added a test program and test input file.
Stripped all csv routines from the parse_args_mod
and added them into their own csv module.
Improved the opening and closing logic. Now, the
file is opened once and rewinded for reading different
variables.
Content of csv file structure is now private. Added
necessary accessor functions to retrieve data.
Cleaned up parse_args_mod and slightly modified the
new converters codes to use the new read_csv_mod.
Also, made small readme changes.
remove the routine that adds spaces and call the new parse routine
directly.  add an option on open to specify the delimiter which is
passed through to the detect routine.  make the test program use the
testeverything code.  it now handles fields with embedded spaces and
alternative delimiters.
moved the csv parse routine into the csv module.
added more tests and made them easier to understand
what was being tested.
re-enabled the 2 tests that provoke a (correct) fatal error.
added a set_term_level() routine to the utilities mod.
Also removed unused routines from the converter
@mgharamti mgharamti force-pushed the insitu_ocean_converters branch from b03bb73 to db2580e Compare January 7, 2026 20:31
@hkershaw-brown hkershaw-brown added the release! bundle with next release label Jan 8, 2026
Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!
Nice work Moha and Nancy, awesome have the csv utilities with this.

@hkershaw-brown hkershaw-brown merged commit c70e184 into NCAR:main Jan 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request obs_converters converting observations to DART format release! bundle with next release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants