Skip to content

R bugfixes across dbfiles, remote helpers, and met alignment#3673

Closed
shbhmexe wants to merge 1 commit intoPecanProject:developfrom
shbhmexe:fix/r-bugfixes-scope-limited
Closed

R bugfixes across dbfiles, remote helpers, and met alignment#3673
shbhmexe wants to merge 1 commit intoPecanProject:developfrom
shbhmexe:fix/r-bugfixes-scope-limited

Conversation

@shbhmexe
Copy link
Contributor

What

Small, targeted fixes to prevent runtime errors and subtle misbehavior in DB utilities, remote job helpers, and meteorology alignment/debiasing routines. No new features, no API changes.

Why

  • dbfiles: wrong get.id argument order prevented lookups; file-type guard always evaluated TRUE.
  • align_met: incorrect rep(..., each=stamps.hr) used a vector as a scalar, leading to length mismatch.
  • debias_met_regression: ensemble-size checks and sampling counts were incorrect; could error or oversample.
  • insert.format.vars: invalid indexing and collect() usage, wrong type check for skip, and unsafe unit checks could raise errors.

Scope/Risk

  • Changes are confined to the listed functions; default behavior is preserved except to correct the bugs.
  • No interfaces or schemas changed; CI and existing callers should remain unaffected.

Testing hints

  • Run existing R tests for affected packages (PEcAn.DB, PEcAn.data.atmosphere, PEcAn.remote) if available.
  • Smoke test align.met() with coarse source and finer training data.
  • Call insert.format.vars() with a tibble for formats_variables and confirm validations and DB writes.
  • Exercise dbfile.move() with supported types ("nc"/"clim").

- dbfiles: correct get.id usage and file-type guard in dbfile.move()
- align_met: fix repetition factor when coercing coarser source data to training timestep
- debias_met_regression: fix ensemble pairing check and sampling counts; use TRUE/FALSE
- insert.format.vars: stricter validation (numeric skip, row-wise field checks, safer unit tests)

Signed-off-by: Shubham Shukla <shubhushukla586@gmail.com>
Copy link
Member

@infotroph infotroph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A bunch of nice fixes here.

I left comments on a few places where the existing behavior was preferable -- in fairness, all of them were confusing and I see why it would look like the change was needed. If you have ideas on how to further clarify those parts of the code I'd love to hear them.

name_test_df <- as.data.frame(name_test)
if(!is.null(name_test_df[1,1])){
name_test <- dplyr::tbl(con, "formats") %>%
dplyr::select(.data$id, .data$name) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of .data inside select() is deprecated -- this trips me up too; the distinction is between "data-masking contexts" and "tidy selection contexts". But long story short, use quoted strings inside select now.

Suggested change
dplyr::select(.data$id, .data$name) %>%
dplyr::select("id", "name") %>%

Comment on lines +61 to +64
# Test if skip is numeric
if(!is.numeric(skip)){
PEcAn.logger::logger.error(
"Skip must be of type numeric"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that enforcing character here was in clear violation of both the comment and the Roxygen documentation saying this should be an integer, but at the same time I think it was serving a good purpose -- below it's inserted into a database column of type character. Maybe the right move is to enforce numeric here and then coerce it to character in formats_df below?


# Test if notes are a character string
if(!is.character(notes)&!is.null(notes)){
if(!is.character(notes) & !is.null(notes)){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(!is.character(notes) & !is.null(notes)){
if(!is.character(notes) && !is.null(notes)){

## Test if variable_id already exists ##
var_id_test <- dplyr::tbl(con, "variables") %>% dplyr::select("id") %>% dplyr::filter(.data$id %in% !!formats_variables[[i, "variable_id"]]) %>% dplyr::collect(.data$id)
if(!is.null(var_id_test[1,1])){
if(identical(suppress, FALSE)){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I missing a reason this can't just be

Suggested change
if(identical(suppress, FALSE)){
if (!suppress) {

?

if(identical(suppress, FALSE)){
## Test if variable_id exists in variables table ##
var_id_test <- dplyr::tbl(con, "variables") %>%
dplyr::select(.data$id) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Suggested change
dplyr::select(.data$id) %>%
dplyr::select("id") %>%

vid <- formats_variables[["variable_id"]][i]
if(!is.na(u1) && u1 != ""){
u2 <- dplyr::tbl(con, "variables") %>%
dplyr::select(.data$id, .data$units) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dplyr::select(.data$id, .data$units) %>%
dplyr::select("id", "units") %>%

for(i in seq_len(nrow(formats_variables))){
u1 <- formats_variables[["unit"]][i]
vid <- formats_variables[["variable_id"]][i]
if(!is.na(u1) && u1 != ""){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip the rows with empty units? If both u1 and u2 are empty then there's no problem; if only one is empty it ought to emit a warning (as they will if the parseable/convertible tests below are applied to them), not be ignored.

(And there's also no need to check for NAs here because they were all filled as "" on line 128)

)
}
# Grab the bety units and compare only if we have both units
if(length(u2) > 0 && !is.na(u2)){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty result seems worth checking for and warning (erroring?) about rather than skipping over. Not sure how u2 would be NA here -- units are constrained to be not null in the variables table.

Comment on lines +159 to 160
header = header,
skip = skip,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both header and skip need to be of type character when they get inserted into the formats table of BetyDB. As commented above, it was a bit weird that skip was converted way up at the top of the function while header stayed logical until here. Let's coerce both in the same place:

Suggested change
header = header,
skip = skip,
header = as.character(header),
skip = as.character(skip),

mimetype_id = mimetype_id,
notes = notes,
name = format_name,
stringsAsFactors = FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@shbhmexe shbhmexe closed this Nov 24, 2025
@shbhmexe shbhmexe deleted the fix/r-bugfixes-scope-limited branch November 24, 2025 09:50
@infotroph
Copy link
Member

infotroph commented Nov 26, 2025

@shbhmexe Why did you close this PR? You were on the right track and that it only needed minor changes to be mergeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants