R bugfixes across dbfiles, remote helpers, and met alignment#3673
R bugfixes across dbfiles, remote helpers, and met alignment#3673shbhmexe wants to merge 1 commit intoPecanProject:developfrom
Conversation
- dbfiles: correct get.id usage and file-type guard in dbfile.move() - align_met: fix repetition factor when coercing coarser source data to training timestep - debias_met_regression: fix ensemble pairing check and sampling counts; use TRUE/FALSE - insert.format.vars: stricter validation (numeric skip, row-wise field checks, safer unit tests) Signed-off-by: Shubham Shukla <shubhushukla586@gmail.com>
infotroph
left a comment
There was a problem hiding this comment.
Thanks! A bunch of nice fixes here.
I left comments on a few places where the existing behavior was preferable -- in fairness, all of them were confusing and I see why it would look like the change was needed. If you have ideas on how to further clarify those parts of the code I'd love to hear them.
| name_test_df <- as.data.frame(name_test) | ||
| if(!is.null(name_test_df[1,1])){ | ||
| name_test <- dplyr::tbl(con, "formats") %>% | ||
| dplyr::select(.data$id, .data$name) %>% |
There was a problem hiding this comment.
Use of .data inside select() is deprecated -- this trips me up too; the distinction is between "data-masking contexts" and "tidy selection contexts". But long story short, use quoted strings inside select now.
| dplyr::select(.data$id, .data$name) %>% | |
| dplyr::select("id", "name") %>% |
| # Test if skip is numeric | ||
| if(!is.numeric(skip)){ | ||
| PEcAn.logger::logger.error( | ||
| "Skip must be of type numeric" |
There was a problem hiding this comment.
I see that enforcing character here was in clear violation of both the comment and the Roxygen documentation saying this should be an integer, but at the same time I think it was serving a good purpose -- below it's inserted into a database column of type character. Maybe the right move is to enforce numeric here and then coerce it to character in formats_df below?
|
|
||
| # Test if notes are a character string | ||
| if(!is.character(notes)&!is.null(notes)){ | ||
| if(!is.character(notes) & !is.null(notes)){ |
There was a problem hiding this comment.
| if(!is.character(notes) & !is.null(notes)){ | |
| if(!is.character(notes) && !is.null(notes)){ |
| ## Test if variable_id already exists ## | ||
| var_id_test <- dplyr::tbl(con, "variables") %>% dplyr::select("id") %>% dplyr::filter(.data$id %in% !!formats_variables[[i, "variable_id"]]) %>% dplyr::collect(.data$id) | ||
| if(!is.null(var_id_test[1,1])){ | ||
| if(identical(suppress, FALSE)){ |
There was a problem hiding this comment.
am I missing a reason this can't just be
| if(identical(suppress, FALSE)){ | |
| if (!suppress) { |
?
| if(identical(suppress, FALSE)){ | ||
| ## Test if variable_id exists in variables table ## | ||
| var_id_test <- dplyr::tbl(con, "variables") %>% | ||
| dplyr::select(.data$id) %>% |
There was a problem hiding this comment.
As above
| dplyr::select(.data$id) %>% | |
| dplyr::select("id") %>% |
| vid <- formats_variables[["variable_id"]][i] | ||
| if(!is.na(u1) && u1 != ""){ | ||
| u2 <- dplyr::tbl(con, "variables") %>% | ||
| dplyr::select(.data$id, .data$units) %>% |
There was a problem hiding this comment.
| dplyr::select(.data$id, .data$units) %>% | |
| dplyr::select("id", "units") %>% |
| for(i in seq_len(nrow(formats_variables))){ | ||
| u1 <- formats_variables[["unit"]][i] | ||
| vid <- formats_variables[["variable_id"]][i] | ||
| if(!is.na(u1) && u1 != ""){ |
There was a problem hiding this comment.
Why skip the rows with empty units? If both u1 and u2 are empty then there's no problem; if only one is empty it ought to emit a warning (as they will if the parseable/convertible tests below are applied to them), not be ignored.
(And there's also no need to check for NAs here because they were all filled as "" on line 128)
| ) | ||
| } | ||
| # Grab the bety units and compare only if we have both units | ||
| if(length(u2) > 0 && !is.na(u2)){ |
There was a problem hiding this comment.
An empty result seems worth checking for and warning (erroring?) about rather than skipping over. Not sure how u2 would be NA here -- units are constrained to be not null in the variables table.
| header = header, | ||
| skip = skip, |
There was a problem hiding this comment.
Both header and skip need to be of type character when they get inserted into the formats table of BetyDB. As commented above, it was a bit weird that skip was converted way up at the top of the function while header stayed logical until here. Let's coerce both in the same place:
| header = header, | |
| skip = skip, | |
| header = as.character(header), | |
| skip = as.character(skip), |
| mimetype_id = mimetype_id, | ||
| notes = notes, | ||
| name = format_name, | ||
| stringsAsFactors = FALSE |
|
@shbhmexe Why did you close this PR? You were on the right track and that it only needed minor changes to be mergeable. |
What
Small, targeted fixes to prevent runtime errors and subtle misbehavior in DB utilities, remote job helpers, and meteorology alignment/debiasing routines. No new features, no API changes.
Why
rep(..., each=stamps.hr)used a vector as a scalar, leading to length mismatch.skip, and unsafe unit checks could raise errors.Scope/Risk
Testing hints
align.met()with coarse source and finer training data.insert.format.vars()with a tibble forformats_variablesand confirm validations and DB writes.dbfile.move()with supported types ("nc"/"clim").