R bugfixes across dbfiles, remote helpers, and met alignment#3673

Closed

shbhmexe wants to merge 1 commit intoPecanProject:developfrom

shbhmexe:fix/r-bugfixes-scope-limited

Contributor

shbhmexe commented Nov 23, 2025

What

Small, targeted fixes to prevent runtime errors and subtle misbehavior in DB utilities, remote job helpers, and meteorology alignment/debiasing routines. No new features, no API changes.

Why

dbfiles: wrong get.id argument order prevented lookups; file-type guard always evaluated TRUE.
align_met: incorrect rep(..., each=stamps.hr) used a vector as a scalar, leading to length mismatch.
debias_met_regression: ensemble-size checks and sampling counts were incorrect; could error or oversample.
insert.format.vars: invalid indexing and collect() usage, wrong type check for skip, and unsafe unit checks could raise errors.

Scope/Risk

Changes are confined to the listed functions; default behavior is preserved except to correct the bugs.
No interfaces or schemas changed; CI and existing callers should remain unaffected.

Testing hints

Run existing R tests for affected packages (PEcAn.DB, PEcAn.data.atmosphere, PEcAn.remote) if available.
Smoke test align.met() with coarse source and finer training data.
Call insert.format.vars() with a tibble for formats_variables and confirm validations and DB writes.
Exercise dbfile.move() with supported types ("nc"/"clim").


          fix: targeted R bugfixes in db, remote, met (no API changes)

8a6c25a

- dbfiles: correct get.id usage and file-type guard in dbfile.move()
- align_met: fix repetition factor when coercing coarser source data to training timestep
- debias_met_regression: fix ensemble pairing check and sampling counts; use TRUE/FALSE
- insert.format.vars: stricter validation (numeric skip, row-wise field checks, safer unit tests)

Signed-off-by: Shubham Shukla <shubhushukla586@gmail.com>

github-actions bot added Modules Base labels

infotroph requested changes

View reviewed changes

Member

infotroph left a comment

Thanks! A bunch of nice fixes here.

I left comments on a few places where the existing behavior was preferable -- in fairness, all of them were confusing and I see why it would look like the change was needed. If you have ideas on how to further clarify those parts of the code I'd love to hear them.

base/db/R/insert.format.vars.R

-                name_test_df <- as.data.frame(name_test)
-                if(!is.null(name_test_df[1,1])){
+                name_test <- dplyr::tbl(con, "formats") %>%
+                  dplyr::select(.data$id, .data$name) %>%

Member

infotroph Nov 24, 2025

Use of .data inside select() is deprecated -- this trips me up too; the distinction is between "data-masking contexts" and "tidy selection contexts". But long story short, use quoted strings inside select now.

Suggested change

      
                dplyr::select(.data$id, .data$name) %>%
          
                dplyr::select("id", "name") %>%

base/db/R/insert.format.vars.R

Comment on lines +61 to +64

    
                 # Test if skip is numeric

                if(!is.numeric(skip)){

                  PEcAn.logger::logger.error(

                    "Skip must be of type numeric"

Member

infotroph Nov 24, 2025

I see that enforcing character here was in clear violation of both the comment and the Roxygen documentation saying this should be an integer, but at the same time I think it was serving a good purpose -- below it's inserted into a database column of type character. Maybe the right move is to enforce numeric here and then coerce it to character in formats_df below?

base/db/R/insert.format.vars.R

                 # Test if notes are a character string
-                if(!is.character(notes)&!is.null(notes)){
+                if(!is.character(notes) & !is.null(notes)){

Member

infotroph Nov 24, 2025

Suggested change

      
              if(!is.character(notes) & !is.null(notes)){
          
              if(!is.character(notes) && !is.null(notes)){

base/db/R/insert.format.vars.R

-                      ## Test if variable_id already exists ##
-                      var_id_test <- dplyr::tbl(con, "variables") %>% dplyr::select("id") %>% dplyr::filter(.data$id %in% !!formats_variables[[i, "variable_id"]]) %>% dplyr::collect(.data$id)
-                      if(!is.null(var_id_test[1,1])){
+                    if(identical(suppress, FALSE)){

Member

infotroph Nov 24, 2025

am I missing a reason this can't just be

Suggested change

      
                  if(identical(suppress, FALSE)){
          
                  if (!suppress) {

?

base/db/R/insert.format.vars.R

+                    if(identical(suppress, FALSE)){
+                      ## Test if variable_id exists in variables table ##
+                      var_id_test <- dplyr::tbl(con, "variables") %>%
+                        dplyr::select(.data$id) %>%

Member

infotroph Nov 24, 2025

As above

Suggested change

      
                      dplyr::select(.data$id) %>%
          
                      dplyr::select("id") %>%

base/db/R/insert.format.vars.R

+                    vid <- formats_variables[["variable_id"]][i]
+                    if(!is.na(u1) && u1 != ""){
+                      u2 <- dplyr::tbl(con, "variables") %>%
+                        dplyr::select(.data$id, .data$units) %>%

Member

infotroph Nov 24, 2025

Suggested change

      
                      dplyr::select(.data$id, .data$units) %>%
          
                      dplyr::select("id", "units") %>%

base/db/R/insert.format.vars.R

+                  for(i in seq_len(nrow(formats_variables))){
+                    u1 <- formats_variables[["unit"]][i]
+                    vid <- formats_variables[["variable_id"]][i]
+                    if(!is.na(u1) && u1 != ""){

Member

infotroph Nov 24, 2025

Why skip the rows with empty units? If both u1 and u2 are empty then there's no problem; if only one is empty it ought to emit a warning (as they will if the parseable/convertible tests below are applied to them), not be ignored.

(And there's also no need to check for NAs here because they were all filled as "" on line 128)

base/db/R/insert.format.vars.R

+                        )
+                      }
+                      # Grab the bety units and compare only if we have both units
+                      if(length(u2) > 0 && !is.na(u2)){

Member

infotroph Nov 24, 2025

An empty result seems worth checking for and warning (erroring?) about rather than skipping over. Not sure how u2 would be NA here -- units are constrained to be not null in the variables table.

base/db/R/insert.format.vars.R

Comment on lines +159 to 160

		header = header,
		skip = skip,

Member

infotroph Nov 24, 2025

Both header and skip need to be of type character when they get inserted into the formats table of BetyDB. As commented above, it was a bit weird that skip was converted way up at the top of the function while header stayed logical until here. Let's coerce both in the same place:

Suggested change

      
                  header = header,
          
                  skip = skip,
          
                  header = as.character(header),
          
                  skip = as.character(skip),

base/db/R/insert.format.vars.R

                     mimetype_id = mimetype_id,
                     notes = notes,
-                    name = format_name,
-                    stringsAsFactors = FALSE

Member

infotroph Nov 24, 2025

Nice catch!

shbhmexe closed this

shbhmexe deleted the fix/r-bugfixes-scope-limited branch

November 24, 2025 09:50

Member

infotroph commented Nov 26, 2025 •

edited

Loading

@shbhmexe Why did you close this PR? You were on the right track and that it only needed minor changes to be mergeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels