M sstats big/work/20260513 arrow chunking followups#17
Conversation
R/clean_spectronaut.R:9-12: added block_size parameter (default 16L * 1024L * 1024L) with coerce + validation. R/clean_spectronaut.R:44: CsvReadOptions$create now uses the parameter. R/converters.R:120-125: new @param block_size roxygen with the straddling-object workaround note. R/converters.R:148-156: bigSpectronauttoMSstatsFormat gains block_size, plumbed to reduceBigSpectronaut. tests/testthat/test-converters.R:97-163: validation tests (rejects negative/zero/NA/vector/string) + plumbing tests (default forwards 16 MiB, override forwards user's value). man/bigSpectronauttoMSstatsFormat.Rd: regenerated from roxygen.
…setnames so the package is data.table-aware (cedta()). R/clean_spectronaut.R:103-187: rewrote cleanSpectronautChunk in data.table: setDT(input) at entry; subsequent operations modify in place via :=. Two-step rename (setnames for standardize, then setnames with skip_absent = TRUE to map standardized→MSstats) matches the MSstatsConvert family pattern. Conditional NA assignment uses mask form dt[cond, Intensity := NA_real_]. Q-value filters preserve dplyr::if_else NA semantics via explicit is.na(EGQvalue) | EGQvalue >= cutoff. Dropped the leftover dplyr::collect(head(dplyr::select(...))) pattern — was a no-op residue from a prior refactor. Function shrank from ~88 lines to ~64. DESCRIPTION:20: added data.table to Imports. NAMESPACE: regenerated, now imports :=, .SD, setDT, setnames from data.table. tests/testthat/test-converters.R:97-211: 5 new tests — schema smoke test, filter_by_excluded, filter_by_identified, filter_by_qvalue (covering the NA-q-value case), and FFrgLossType row drop.
📝 WalkthroughWalkthroughThis PR refactors Spectronaut CSV processing to use Apache Arrow record-batch streaming instead of readr chunking, converts chunk processing from dplyr to data.table operations, and introduces a configurable ChangesSpectronaut Arrow Streaming & Data.Table Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@R/clean_spectronaut.R`:
- Around line 155-167: The current filter block assumes columns Excluded,
Identified, EGQvalue, PGQvalue, and FFrgLossType always exist and will error if
any are missing; update the logic in clean_spectronaut.R to guard each filter by
checking column presence (e.g., use "if ('Excluded' %in% names(input))" before
the Excluded filter, similarly for Identified, EGQvalue and PGQvalue before
applying qvalue-based NA assignment, and check for FFrgLossType before
subsetting), so each conditional only runs when its target column exists and the
behavior remains unchanged otherwise.
- Around line 25-58: The computed needed_cols is never passed to Arrow, so the
CSV reader still parses all columns; update the CsvConvertOptions usage to
include the projection by calling
arrow::CsvConvertOptions$create(include_columns = needed_cols) (or set the
include_columns field on convert_opts after creation) so that convert_opts
includes needed_cols before calling arrow::open_dataset/Scanner$create;
reference the symbols needed_cols and convert_opts (and the call
arrow::CsvConvertOptions$create) and ensure this happens prior to creating
ds/reader.
In `@tests/testthat/test-converters.R`:
- Around line 106-113: The test's expect_error calls are too broad—change them
to assert the error message mentions "block_size" so they only pass when
block_size validation fails; update each expect_error(reduceBigSpectronaut(...),
...) in tests/testthat/test-converters.R to include a second argument (string or
regex) that matches "block_size" (e.g., "block_size" or "block_size.*invalid")
for the negative values and vector cases, and similarly for the
suppressed-warning call so all invalid-block_size cases are checked by message
content rather than any error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9ebfd981-78cf-43a3-b93a-fd55ccfaeccc
📒 Files selected for processing (7)
DESCRIPTIONNAMESPACER/clean_spectronaut.RR/converters.Rman/bigSpectronauttoMSstatsFormat.Rdman/dot-prefixedPath.Rdtests/testthat/test-converters.R
| # Columns cleanSpectronautChunk actually consumes; Arrow's | ||
| # convert_options$include_columns drops everything else at parse time so | ||
| # we never materialize the ~35 unused columns Spectronaut exports. | ||
| needed_cols <- c("R.FileName", "R.Condition", "R.Replicate", | ||
| "PG.ProteinAccessions", "EG.ModifiedSequence", | ||
| "FG.LabeledSequence", "FG.Charge", | ||
| "F.FrgIon", "F.Charge", | ||
| "EG.Identified", "F.ExcludedFromQuantification", | ||
| "F.FrgLossType", "PG.Qvalue", "EG.Qvalue", | ||
| intensity) | ||
| if (calculateAnomalyScores) { | ||
| needed_cols <- c(needed_cols, anomalyModelFeatures) | ||
| } | ||
|
|
||
| # Arrow's CSV reader replaces readr::read_delim_chunked. Arrow releases | ||
| # per-batch state as soon as a batch is consumed, so peak memory is | ||
| # bounded by one record batch instead of growing with the dataset (readr | ||
| # keeps a string-interning pool that accumulates across chunks). The | ||
| # `delim` switch above already covers comma / tab / semicolon variants; | ||
| # Arrow's CSV reader handles all three the same way through | ||
| # CsvParseOptions$delimiter. | ||
| parse_opts <- arrow::CsvParseOptions$create(delimiter = delim) | ||
| convert_opts <- arrow::CsvConvertOptions$create() | ||
| read_opts <- arrow::CsvReadOptions$create(block_size = block_size) | ||
|
|
||
| ds <- arrow::open_dataset( | ||
| input_file, | ||
| format = "csv", | ||
| parse_options = parse_opts, | ||
| convert_options = convert_opts, | ||
| read_options = read_opts | ||
| ) | ||
|
|
||
| reader <- arrow::Scanner$create(ds)$ToRecordBatchReader() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether column projection is actually wired in the Arrow path.
rg -n -C2 'needed_cols|CsvConvertOptions\\$create|include_columns|Scanner\\$create|open_dataset\\(' R/clean_spectronaut.RRepository: Vitek-Lab/MSstatsBig
Length of output: 289
🏁 Script executed:
#!/bin/bash
# First, check for needed_cols usage in the file
echo "=== Searching for needed_cols usage ==="
rg -n 'needed_cols' R/clean_spectronaut.R
echo ""
echo "=== Searching for CsvConvertOptions and include_columns ==="
rg -n 'CsvConvertOptions\|include_columns' R/clean_spectronaut.R
echo ""
echo "=== Checking context around CsvConvertOptions create call ==="
rg -n -B3 -A3 'CsvConvertOptions\$create' R/clean_spectronaut.RRepository: Vitek-Lab/MSstatsBig
Length of output: 688
🏁 Script executed:
#!/bin/bash
# Check what happens after reader creation
echo "=== Code from line 50 onwards ==="
sed -n '50,80p' R/clean_spectronaut.R
echo ""
echo "=== Complete function to see full context ==="
wc -l R/clean_spectronaut.RRepository: Vitek-Lab/MSstatsBig
Length of output: 1118
🏁 Script executed:
#!/bin/bash
# Search for include_columns usage in the entire codebase
echo "=== Searching for include_columns in codebase ==="
rg -n 'include_columns' .
echo ""
echo "=== Searching for CsvConvertOptions usage patterns ==="
rg -n 'CsvConvertOptions' . -A2
echo ""
echo "=== Check the comment context more carefully ==="
sed -n '25,50p' R/clean_spectronaut.RRepository: Vitek-Lab/MSstatsBig
Length of output: 1892
Wire needed_cols into Arrow's column projection.
needed_cols is computed but never applied to CsvConvertOptions, so the reader still parses all CSV columns. This defeats the intended memory/performance reduction on wide Spectronaut exports.
💡 Proposed fix
- convert_opts <- arrow::CsvConvertOptions$create()
+ convert_opts <- arrow::CsvConvertOptions$create(
+ include_columns = needed_cols
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@R/clean_spectronaut.R` around lines 25 - 58, The computed needed_cols is
never passed to Arrow, so the CSV reader still parses all columns; update the
CsvConvertOptions usage to include the projection by calling
arrow::CsvConvertOptions$create(include_columns = needed_cols) (or set the
include_columns field on convert_opts after creation) so that convert_opts
includes needed_cols before calling arrow::open_dataset/Scanner$create;
reference the symbols needed_cols and convert_opts (and the call
arrow::CsvConvertOptions$create) and ensure this happens prior to creating
ds/reader.
| if (filter_by_excluded) { | ||
| input <- dplyr::mutate( | ||
| input, Intensity = dplyr::if_else(Excluded, NA_real_, Intensity)) | ||
|
|
||
| input[Excluded == TRUE, Intensity := NA_real_] | ||
| } | ||
|
|
||
| if (filter_by_identified) { | ||
| input <- dplyr::mutate( | ||
| input, Intensity = dplyr::if_else(Identified, Intensity, NA_real_)) | ||
| input[Identified == FALSE, Intensity := NA_real_] | ||
| } | ||
|
|
||
| if (filter_by_qvalue) { | ||
| input <- dplyr::mutate( | ||
| input, | ||
| Intensity = dplyr::if_else(EGQvalue < qvalue_cutoff, Intensity, NA_real_)) | ||
| input <- dplyr::mutate( | ||
| input, | ||
| Intensity = dplyr::if_else(PGQvalue < qvalue_cutoff, Intensity, NA_real_)) | ||
| # Preserve dplyr::if_else semantics: rows with NA q-values become NA. | ||
| input[is.na(EGQvalue) | EGQvalue >= qvalue_cutoff, Intensity := NA_real_] | ||
| input[is.na(PGQvalue) | PGQvalue >= qvalue_cutoff, Intensity := NA_real_] | ||
| } | ||
|
|
||
| input <- dplyr::filter(input, FFrgLossType == "noloss") | ||
| if (is.element("LabeledSequence", colnames(input))) { | ||
| input <- dplyr::mutate(input, IsLabeled = grepl("Lys8", LabeledSequence) | grepl("Arg10", LabeledSequence)) | ||
| input <- dplyr::mutate(input, IsotopeLabelType := dplyr::if_else(IsLabeled, "H", "L")) | ||
|
|
||
| input <- input[FFrgLossType == "noloss"] |
There was a problem hiding this comment.
Guard filter steps when selected columns are absent.
After present_orig subsetting, downstream filters still assume Excluded, Identified, EGQvalue, PGQvalue, and FFrgLossType exist. Missing columns here will raise runtime errors.
🛡️ Proposed fix
- if (filter_by_excluded) {
+ if (filter_by_excluded && "Excluded" %in% colnames(input)) {
input[Excluded == TRUE, Intensity := NA_real_]
}
- if (filter_by_identified) {
+ if (filter_by_identified && "Identified" %in% colnames(input)) {
input[Identified == FALSE, Intensity := NA_real_]
}
if (filter_by_qvalue) {
- input[is.na(EGQvalue) | EGQvalue >= qvalue_cutoff, Intensity := NA_real_]
- input[is.na(PGQvalue) | PGQvalue >= qvalue_cutoff, Intensity := NA_real_]
+ if ("EGQvalue" %in% colnames(input)) {
+ input[is.na(EGQvalue) | EGQvalue >= qvalue_cutoff, Intensity := NA_real_]
+ }
+ if ("PGQvalue" %in% colnames(input)) {
+ input[is.na(PGQvalue) | PGQvalue >= qvalue_cutoff, Intensity := NA_real_]
+ }
}
-
- input <- input[FFrgLossType == "noloss"]
+ if ("FFrgLossType" %in% colnames(input)) {
+ input <- input[FFrgLossType == "noloss"]
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (filter_by_excluded) { | |
| input <- dplyr::mutate( | |
| input, Intensity = dplyr::if_else(Excluded, NA_real_, Intensity)) | |
| input[Excluded == TRUE, Intensity := NA_real_] | |
| } | |
| if (filter_by_identified) { | |
| input <- dplyr::mutate( | |
| input, Intensity = dplyr::if_else(Identified, Intensity, NA_real_)) | |
| input[Identified == FALSE, Intensity := NA_real_] | |
| } | |
| if (filter_by_qvalue) { | |
| input <- dplyr::mutate( | |
| input, | |
| Intensity = dplyr::if_else(EGQvalue < qvalue_cutoff, Intensity, NA_real_)) | |
| input <- dplyr::mutate( | |
| input, | |
| Intensity = dplyr::if_else(PGQvalue < qvalue_cutoff, Intensity, NA_real_)) | |
| # Preserve dplyr::if_else semantics: rows with NA q-values become NA. | |
| input[is.na(EGQvalue) | EGQvalue >= qvalue_cutoff, Intensity := NA_real_] | |
| input[is.na(PGQvalue) | PGQvalue >= qvalue_cutoff, Intensity := NA_real_] | |
| } | |
| input <- dplyr::filter(input, FFrgLossType == "noloss") | |
| if (is.element("LabeledSequence", colnames(input))) { | |
| input <- dplyr::mutate(input, IsLabeled = grepl("Lys8", LabeledSequence) | grepl("Arg10", LabeledSequence)) | |
| input <- dplyr::mutate(input, IsotopeLabelType := dplyr::if_else(IsLabeled, "H", "L")) | |
| input <- input[FFrgLossType == "noloss"] | |
| if (filter_by_excluded && "Excluded" %in% colnames(input)) { | |
| input[Excluded == TRUE, Intensity := NA_real_] | |
| } | |
| if (filter_by_identified && "Identified" %in% colnames(input)) { | |
| input[Identified == FALSE, Intensity := NA_real_] | |
| } | |
| if (filter_by_qvalue) { | |
| # Preserve dplyr::if_else semantics: rows with NA q-values become NA. | |
| if ("EGQvalue" %in% colnames(input)) { | |
| input[is.na(EGQvalue) | EGQvalue >= qvalue_cutoff, Intensity := NA_real_] | |
| } | |
| if ("PGQvalue" %in% colnames(input)) { | |
| input[is.na(PGQvalue) | PGQvalue >= qvalue_cutoff, Intensity := NA_real_] | |
| } | |
| } | |
| if ("FFrgLossType" %in% colnames(input)) { | |
| input <- input[FFrgLossType == "noloss"] | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@R/clean_spectronaut.R` around lines 155 - 167, The current filter block
assumes columns Excluded, Identified, EGQvalue, PGQvalue, and FFrgLossType
always exist and will error if any are missing; update the logic in
clean_spectronaut.R to guard each filter by checking column presence (e.g., use
"if ('Excluded' %in% names(input))" before the Excluded filter, similarly for
Identified, EGQvalue and PGQvalue before applying qvalue-based NA assignment,
and check for FFrgLossType before subsetting), so each conditional only runs
when its target column exists and the behavior remains unchanged otherwise.
| expect_error(reduceBigSpectronaut(input_file, output_file, block_size = -1L)) | ||
| expect_error(reduceBigSpectronaut(input_file, output_file, block_size = 0L)) | ||
| expect_error(reduceBigSpectronaut(input_file, output_file, block_size = NA_integer_)) | ||
| expect_error(reduceBigSpectronaut(input_file, output_file, block_size = c(1L, 2L))) | ||
| expect_error(suppressWarnings( | ||
| reduceBigSpectronaut(input_file, output_file, block_size = "16MB") | ||
| )) | ||
| }) |
There was a problem hiding this comment.
Strengthen invalid block_size expectations to avoid false positives.
These assertions currently pass on any error. If reducer internals fail for another reason, this test can still pass even when block_size validation regresses. Constrain the expected error to include block_size in the message.
Suggested tightening
- expect_error(reduceBigSpectronaut(input_file, output_file, block_size = -1L))
- expect_error(reduceBigSpectronaut(input_file, output_file, block_size = 0L))
- expect_error(reduceBigSpectronaut(input_file, output_file, block_size = NA_integer_))
- expect_error(reduceBigSpectronaut(input_file, output_file, block_size = c(1L, 2L)))
+ expect_error(reduceBigSpectronaut(input_file, output_file, block_size = -1L), regexp = "block_size")
+ expect_error(reduceBigSpectronaut(input_file, output_file, block_size = 0L), regexp = "block_size")
+ expect_error(reduceBigSpectronaut(input_file, output_file, block_size = NA_integer_), regexp = "block_size")
+ expect_error(reduceBigSpectronaut(input_file, output_file, block_size = c(1L, 2L)), regexp = "block_size")
expect_error(suppressWarnings(
reduceBigSpectronaut(input_file, output_file, block_size = "16MB")
- ))
+ ), regexp = "block_size")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/testthat/test-converters.R` around lines 106 - 113, The test's
expect_error calls are too broad—change them to assert the error message
mentions "block_size" so they only pass when block_size validation fails; update
each expect_error(reduceBigSpectronaut(...), ...) in
tests/testthat/test-converters.R to include a second argument (string or regex)
that matches "block_size" (e.g., "block_size" or "block_size.*invalid") for the
negative values and vector cases, and similarly for the suppressed-warning call
so all invalid-block_size cases are checked by message content rather than any
error.
| anomalyModelFeatures=c()) { | ||
| calculateAnomalyScores=FALSE, | ||
| anomalyModelFeatures=c(), | ||
| block_size = 16L * 1024L * 1024L) { |
There was a problem hiding this comment.
- For people who may want to increase their block size to increase the speed of processing, add a recommendation on how to estimate the adequate block size to maximize speed while reducing the risk of the system crashing
- Make it more clear in the MSstatsBig / MSstatsConvert documentation on which columns we actually need from users for Spectronaut
Motivation and Context
reduceBigSpectronaut()previously streamed Spectronaut CSV exports throughreadr::read_delim_chunked, which holds a string-interning pool that grows across chunks and pushed peak memory well above one batch's working set. This branch ports the reader to Arrow (which releases per-batch state) and follows up with the three remaining items fromTODO-arrow_chunking_followups.md:block_sizewas Arrow's 256 KiB default — too small for wide Spectronaut rows, causingInvalid: straddling object straddles two block boundarieserrors and excessive parser overhead.cleanSpectronautChunk()ran every batch through ~13 sequentialdplyrverbs on adata.frame, producing repeated transient column allocations and fragmenting R's allocator. The rest of theMSstatsConvertfamily isdata.table-native.Solution: replace
readrwith Arrow'sCsvReadOptions/Scanner/ToRecordBatchReader, raise the defaultblock_sizeto 16 MiB and expose it as a user parameter, document the straddling-object workaround at the parameter, and rewritecleanSpectronautChunk()in puredata.table.Changes
readr::read_delim_chunkedinreduceBigSpectronaut(). Usesarrow::open_dataset()+Scanner+ToRecordBatchReaderto stream record batches; preserves the existing comma/tab/semicolon delimiter switch viaCsvParseOptions$delimiter. Per-batch progress logging every 1,000 batches.block_sizeparameter on bothreduceBigSpectronaut()and the user-facingbigSpectronauttoMSstatsFormat(). Default16L * 1024L * 1024L(16 MiB) replaces Arrow's 256 KiB default. Coerced to integer and validated (length 1, non-NA, positive).@param block_sizedocuments the exact error string (Invalid: straddling object straddles two block boundaries) and the recommended override (64L * 1024L * 1024L) so users hitting the straddling error on pathological rows have a self-service fix.cleanSpectronautChunk()rewritten indata.table.setDT(input)at entry; column selection viadt[, cols, with = FALSE]; two-step rename viasetnames(..., skip_absent = TRUE)matching theMSstatsConvertfamily convention; in-place column updates via:=; conditional NA assignment via mask formdt[cond, Intensity := NA_real_]. Function shrank from ~88 lines to ~64.is.na(EGQvalue) | EGQvalue >= cutoffso rows with missing q-values still getIntensity = NA, matching the previousdplyr::if_elsebehavior (a naivedata.tabletranslation would silently change this).dplyr::collect(head(dplyr::select(...)))pattern at the old lines 140/144 was a no-op residue from an earlier Arrow-Table refactor and is gone.data.tableadded toImportsinDESCRIPTIONand imported via@importFrom data.table := .SD setDT setnamesso the package iscedta()-aware. RegeneratedNAMESPACEandman/bigSpectronauttoMSstatsFormat.Rdviadevtools::document().Testing
All tests run via
devtools::test(): 51 PASS, 0 FAIL, 0 WARN, 0 SKIP.New tests added in
tests/testthat/test-converters.R:reduceBigSpectronautvalidatesblock_size: rejects negative, zero,NA, length-2 vector, and unparseable string inputs.bigSpectronauttoMSstatsFormatplumbsblock_sizethrough: spies onreduceBigSpectronautviamockery::stub, asserts the default forwards16L * 1024L * 1024Land an explicit override forwards the user's value.cleanSpectronautChunkschema smoke test: synthetic minimal Spectronaut-shaped input, asserts output column set and basic values.cleanSpectronautChunkfilter_by_excluded: rows withExcluded == "True"getIntensity = NA.cleanSpectronautChunkfilter_by_identified: rows withIdentified == "False"getIntensity = NA.cleanSpectronautChunkfilter_by_qvalue(incl. NA case): rows below cutoff are kept, above cutoff becomeNA, andNAq-values becomeNA— the explicit semantic guarantee from the rewrite.cleanSpectronautChunkdrops rows whereF.FrgLossType != "noloss".Before this branch
cleanSpectronautChunkhad no direct test coverage (existing tests stubbedreduceBigSpectronautout entirely).Checklist Before Requesting a Review
Motivation & Context
The Spectronaut CSV parsing pipeline in
reduceBigSpectronaut()previously usedreadr::read_delim_chunked(), which accumulates string-interning state across chunks, leading to unbounded memory growth on large files. Additionally, the readr pipeline encountered "straddling object" errors where rows span block boundaries, preventing successful processing of certain Spectronaut exports. ThecleanSpectronautChunk()function was implemented with dplyr pipelines, which is less memory-efficient for large-scale data processing.Solution
Replace the readr-based chunked reader with Apache Arrow's streaming CSV reader (
arrow::open_dataset()+Scanner+ToRecordBatchReader), which bounds memory to a single record batch and properly handles block boundaries. RewritecleanSpectronautChunk()in data.table for better performance and memory efficiency. Expose a configurableblock_sizeparameter (default 16 MiB) to allow tuning for pathological cases (up to 64 MiB). Add progress logging every 1,000 batches during streaming.Detailed Changes
Core Implementation (
R/clean_spectronaut.R)reduceBigSpectronaut()rewrite:readr::read_delim_chunked()with Arrow CSV streaming pipelineCsvParseOptions$create(delimiter = delim)+CsvConvertOptions$create()+CsvReadOptions$create(block_size = block_size)needed_colsto prevent materializing unused Spectronaut columns (~35 unused columns dropped at parse time)reader$read_next_batch()→ convert to data.frame → callcleanSpectronautChunk()pos), batch index (batch_idx), elapsed time, and processing rate (rows/sec)block_sizeparameter: coerced to integer, validated (length 1, non-NA, positive)reduceBigSpectronaut(..., anomalyModelFeatures=c(), block_size = 16L * 1024L * 1024L)cleanSpectronautChunk()rewrite from dplyr to data.table:setDT()for in-place conversiondt[, cols, with = FALSE]setnames(..., skip_absent = TRUE))Intensity := as.numeric(Intensity), logical conversions forExcludedandIdentified(character "True" → logicalTRUE)dt[cond, col := NA_real_])is.na(EGQvalue) | EGQvalue >= cutoffIsotopeLabelTypederived fromLabeledSequenceor defaults to "L"Public API (
R/converters.R)bigSpectronauttoMSstatsFormat()updated:block_sizeparameter added afterconnection = NULL16L * 1024L * 1024L(16 MiB)reduceBigSpectronaut()callDocumentation Updates
man/bigSpectronauttoMSstatsFormat.Rd:block_sizeparameter to usage signature\arguments{}section describing block size controlman/dot-prefixedPath.Rd(new file):.prefixedPath(prefix, path)Package Metadata
DESCRIPTION:data.tabletoImportslistNAMESPACE:importFromdirectives:importFrom(data.table, ":="),importFrom(data.table, ".SD"),importFrom(data.table, setDT),importFrom(data.table, setnames)Unit Tests
Two new test cases added to
tests/testthat/test-converters.R:Block size validation test (
test_that("reduceBigSpectronaut block_size argument validation", ...)):reduceBigSpectronaut()rejects invalidblock_sizeinputs: negative values, zero,NA, vector lengths > 1, non-numeric stringsstopifnot()failure as expectedBlock size forwarding test (
test_that("bigSpectronauttoMSstatsFormat forwards block_size to reduceBigSpectronaut", ...)):reduceBigSpectronautto capture the forwardedblock_sizeargument16L * 1024L * 1024L) whenblock_sizenot explicitly provided8L * 1024L * 1024L) is correctly forwardedon.exit()handlersCoding Guidelines
No violations identified. The implementation follows R best practices:
stopifnot()with clear assertionson.exit()