Skip to content

feat(dock_from_renv): default to non-root user 'rstudio' out of the box#106

Merged
VincentGuyader merged 1 commit intomasterfrom
feat/user-cache-out-of-box-100
May 6, 2026
Merged

feat(dock_from_renv): default to non-root user 'rstudio' out of the box#106
VincentGuyader merged 1 commit intomasterfrom
feat/user-cache-out-of-box-100

Conversation

@VincentGuyader
Copy link
Copy Markdown
Member

Summary

Closes #100.

Running the container as root is bad practice; the package's own
default FROM = "rocker/r-base" ships an rstudio user precisely
to make non-root operation easy. Up to now dock_from_renv()
required the caller to pass user = "rstudio" explicitly, and even
then the cache mount target was wrong (left at /root/.cache/R/renv
which rstudio cannot write to).

This PR makes user = "rstudio" the default and ensures the
generated Dockerfile actually works for any non-root user passed.

Three changes, applied as a unit

1. Default user flips from NULL to "rstudio"

The runtime container now drops privilege to rstudio by default.
Pass user = NULL to opt out and keep the previous root behaviour.

2. Defensive useradd at the top of the Dockerfile

RUN id -u <user> >/dev/null 2>&1 || useradd -m -d /home/<user> -s /bin/bash <user>

Idempotent: a no-op on rocker/* images where the user already
exists, a real useradd on r-base, ubuntu:*, debian:*. Works
for any user (rstudio, kevin, myapp, ...) on debian/ubuntu.

For alpine-based images, pass user = NULL and create the user
yourself (alpine uses adduser, not useradd).

3. Auto-derive renv_paths_cache and chown it before USER

renv_paths_cache is auto-derived from user when not supplied:

  • user = NULL -> /root/.cache/R/renv (legacy).
  • user = "rstudio" -> /home/rstudio/.cache/R/renv (new default).
  • user = "kevin" -> /home/kevin/.cache/R/renv.

The Dockerfile emits an explicit
RUN mkdir -p && chown -R <user>:<user> ${RENV_PATHS_CACHE}
step right before the USER <user> directive, so the
--mount=type=cache,target=${RENV_PATHS_CACHE} later is writable.

The USER directive is moved from immediately after the ARG/ENV
setup (where it broke apt-get because apt-get needs root) to
right before the renv::restore() cache-mount RUN. All install
steps run as root; only renv::restore() and the runtime container
run as <user>.

Behaviour change

For users regenerating their Dockerfile without specifying user:

+RUN id -u rstudio >/dev/null 2>&1 || useradd -m -d /home/rstudio ...
-ARG RENV_PATHS_CACHE=/root/.cache/R/renv
+ARG RENV_PATHS_CACHE=/home/rstudio/.cache/R/renv
 ...
+RUN mkdir -p ${RENV_PATHS_CACHE}
+RUN chown -R rstudio:rstudio ${RENV_PATHS_CACHE}
+USER rstudio
 RUN --mount=type=cache,...,target=${RENV_PATHS_CACHE} R -e 'renv::restore()'

Test plan

  • TDD red-first: 5 new tests cover all branches.
    • user = NULL: no USER, no chown, no useradd, cache at /root.
    • default (user = "rstudio"): cache at /home/rstudio.
    • ordering invariant: apt-get -> chown -> USER -> renv::restore.
    • explicit renv_paths_cache + user: explicit path honoured, chown applied.
    • defensive useradd at the top of the Dockerfile.
  • Existing test dock_from_renv works updated; fixture
    tests/testthat/renv_Dockerfile regenerated.
  • devtools::test() -> [ FAIL 0 | WARN 3 | SKIP 0 | PASS 253 ].
  • R CMD check --as-cran -> 0 errors / 0 warnings / 0 notes
    (2m 45s).

Context

Tier A item from the post-CRAN-prep roadmap. Companion to #105
(strict_install for dock_from_desc).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates dock_from_renv() to generate Dockerfiles that run as a non-root user by default (now user = "rstudio"), fixing the prior mismatch where a non-root USER could be combined with a root-owned RENV_PATHS_CACHE. It also adjusts Dockerfile ordering so root-required steps run before dropping privileges, and expands tests/docs accordingly.

Changes:

  • Default user changes from NULL to "rstudio", with user = NULL preserving legacy root behavior.
  • renv_paths_cache now defaults to NULL and is auto-derived from user, with a pre-USER mkdir/chown to ensure the cache mount target is writable.
  • Adds a defensive useradd (guarded by id -u) and updates tests, fixtures, NEWS, and generated Rd docs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
R/dock_from_renv.R Implements new defaults, cache path derivation, defensive user creation, and moves privilege drop to just before renv::restore().
tests/testthat/test-dock_from_renv.R Adds/updates tests to cover new user/cache behavior and Dockerfile ordering invariants.
tests/testthat/renv_Dockerfile Regenerates the expected Dockerfile fixture to reflect new instructions/order.
man/dock_from_renv.Rd Updates generated documentation for new parameter defaults and behavior.
NEWS.md Documents the behavior change and opt-out for legacy root behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread R/dock_from_renv.R
Comment on lines +157 to +165
if (!is.null(user)) {
dock$RUN(
sprintf(
"id -u %s >/dev/null 2>&1 || useradd -m -d /home/%s -s /bin/bash %s",
user,
user,
user
)
)
Comment thread R/dock_from_renv.R Outdated
Comment on lines +328 to +329
dock$RUN("mkdir -p ${RENV_PATHS_CACHE}")
dock$RUN(sprintf("chown -R %s:%s ${RENV_PATHS_CACHE}", user, user))
Comment thread R/dock_from_renv.R Outdated
Comment on lines +89 to +90
#' emits a `RUN mkdir -p && chown -R <user>:<user> ${RENV_PATHS_CACHE}`
#' step right before the `USER <user>` directive so the cache mount
Comment thread tests/testthat/test-dock_from_renv.R Outdated
Comment on lines +458 to +469
test_that("dock_from_renv with user = NULL keeps the RENV_PATHS_CACHE at /root", {
skip_if(is_rdevel, "skip on R-devel")
out <- dock_from_renv(
lockfile = the_lockfile,
FROM = "rocker/verse",
user = NULL,
renv_version = "0.0.0"
)
df <- paste(out$Dockerfile, collapse = "\n")
expect_match(df, "ARG RENV_PATHS_CACHE=/root/.cache/R/renv", fixed = TRUE)
})

Closes #100.

Running the container as root is bad practice; the package's own
default `FROM = "rocker/r-base"` ships an `rstudio` user precisely
to make non-root operation easy. Up to now `dock_from_renv()`
required the caller to pass `user = "rstudio"` explicitly, and even
then the cache mount target was wrong (left at `/root/.cache/R/renv`
which `rstudio` cannot write to).

Three changes, applied as a unit:

1. Default `user` flips from `NULL` to `"rstudio"`. Pass `user = NULL`
   to opt out and keep the previous root behaviour.

2. The Dockerfile gains a defensive useradd at the top:
   `RUN id -u <user> >/dev/null 2>&1 || useradd -m -d /home/<user> -s /bin/bash <user>`.
   This is idempotent: a no-op on rocker/* images where the user
   already exists, a real `useradd` on `r-base`, `ubuntu:*`,
   `debian:*`. debian/ubuntu only.

3. `renv_paths_cache` is auto-derived from `user` when not supplied:
   `/home/<user>/.cache/R/renv` for non-NULL user, `/root/.cache/R/renv`
   for `user = NULL`. The Dockerfile emits an explicit
   `RUN mkdir -p && chown -R <user>:<user> ${RENV_PATHS_CACHE}`
   step right before the `USER <user>` directive, so the
   `--mount=type=cache,target=${RENV_PATHS_CACHE}` later is writable.

The `USER` directive is moved from immediately after the ARG/ENV
setup (where it broke `apt-get` because apt-get needs root) to
right before the `renv::restore()` cache-mount RUN. All the install
steps run as root; only `renv::restore()` and the runtime container
run as `<user>`.

Test fixture `tests/testthat/renv_Dockerfile` regenerated to match
the new default output. Five new tests in `test-dock_from_renv.R`
cover the new contract:

- user = NULL: no USER, no chown, no useradd, cache at /root.
- default (user = "rstudio"): cache at /home/rstudio.
- ordering invariant: apt-get -> chown -> USER -> renv::restore.
- explicit renv_paths_cache: honored, chown applied to it.
- defensive useradd: emitted at the top, before any install RUN.

R CMD check --as-cran pending (run after push). devtools::test():
[ FAIL 0 | WARN 3 | SKIP 0 | PASS 253 ].
@VincentGuyader VincentGuyader force-pushed the feat/user-cache-out-of-box-100 branch from e06619d to 891b75c Compare May 6, 2026 07:21
@VincentGuyader VincentGuyader requested a review from Copilot May 6, 2026 07:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread R/dock_from_renv.R
Comment on lines +160 to +166
if (is.null(renv_paths_cache)) {
renv_paths_cache <- if (is.null(user)) {
"/root/.cache/R/renv"
} else {
sprintf("/home/%s/.cache/R/renv", user)
}
}
Comment thread R/dock_from_renv.R
Comment on lines +356 to +363
dock$RUN(
sprintf(
'mkdir -p "${RENV_PATHS_CACHE}" && chown -R %s:%s "${RENV_PATHS_CACHE}"',
user,
user
)
)
dock$USER(user)
Comment thread NEWS.md
with the existing `renv_version = NULL` behaviour. Closes #94.
- `dock_from_renv()` now defaults to running the runtime container as
the `rstudio` user (previously root). The generated Dockerfile gains
a defensive `RUN id -u rstudio || useradd -m -d /home/rstudio -s /bin/bash rstudio`
@VincentGuyader VincentGuyader merged commit f0624ff into master May 6, 2026
10 checks passed
VincentGuyader added a commit that referenced this pull request May 6, 2026
The renv_paths_cache bullet still described the pre-#106 behaviour
(default = "/root/.cache/R/renv"). After #106, the signature default
is NULL with auto-derivation from `user`: /root for user = NULL,
/home/<user> otherwise. Update the bullet to reflect the actual
runtime, closing the doc-runtime drift introduced by the merge.

No code change. No NAMESPACE / Rd impact.
VincentGuyader added a commit that referenced this pull request May 6, 2026
…text

Closes the post-#106 security audit follow-up. Adds 9 .validate_*
helpers in R/utils.R (@nord) that reject inputs which would
otherwise be interpolated raw into the generated Dockerfile's shell
commands. Each helper raises stop() with a clear message naming the
offending parameter.

Validators added:

- .validate_FROM (image reference: `^[a-zA-Z0-9][a-zA-Z0-9._/-]*(:tag)?(@sha256:hex)?$`)
- .validate_AS (build-stage name: `^[a-zA-Z0-9][a-zA-Z0-9._-]*$`)
- .validate_repos (https URLs + simple-identifier names; closes the
  `dput()`-backtick injection vector on names)
- .validate_extra_sysreqs (Debian package name: `^[a-z0-9][a-z0-9.+-]+$`)
- .validate_renv_version (semver-like, NULL allowed)
- .validate_lockfile (basename: alphanumerics + `._-`)
- .validate_renv_paths_cache (absolute path, no metacharacters)
- .validate_r_version (defensive depth on lockfile-derived value)
- .validate_scalar_logical (factored out from strict_install; used
  for use_pak too, closing a real injection vector via
  `echo "options(renv.config.pak.enabled = %s, ...)"`)

dock_from_desc() and dock_from_renv() call the relevant validators
at function entry, just after match.arg(github_pat).

The pr-reviewer agent flagged two additional vectors I missed in
the first pass: use_pak (B1, blocked) and names(repos) (B2, blocked
via the dput() backtick path). Both are closed by this commit.

Tests: 9 new expect_error() blocks across test-dock_from_desc.R
and test-dock_from_renv.R, each TDD red-against-buggy. One
existing test patched: renv_version = "banana" -> "1.2.3" since
the new validator rejects non-semver strings (the test was
exercising the install_version code path, which "1.2.3" still
covers).

devtools::test() -> [ FAIL 0 | WARN 3 | SKIP 0 | PASS 302 ].
R CMD check --as-cran: 0 errors / 0 warnings / 1 note (the
"unable to verify current time" host note, unrelated).

Out of scope for this PR (per the threat model agreed with the
maintainer):
- R6 helpers in R/add.R (dock$RUN, dock$COPY, ...) are
  author-controlled API; if the author writes a malicious RUN it
  is self-foot-shooting, not injection.
- distro is deprecated and ignored at runtime.
- github_pat is match.arg'd against a fixed enum.

NEWS.md gains a `## Security` section above `## New features`.
VincentGuyader added a commit that referenced this pull request May 6, 2026
…text

Closes the post-#106 security audit follow-up. Adds 9 .validate_*
helpers in R/utils.R (@nord) that reject inputs which would
otherwise be interpolated raw into the generated Dockerfile's shell
commands. Each helper raises stop() with a clear message naming the
offending parameter.

Validators added:

- .validate_FROM (image reference: `^[a-zA-Z0-9][a-zA-Z0-9._/-]*(:tag)?(@sha256:hex)?$`)
- .validate_AS (build-stage name: `^[a-zA-Z0-9][a-zA-Z0-9._-]*$`)
- .validate_repos (https URLs + simple-identifier names; closes the
  `dput()`-backtick injection vector on names)
- .validate_extra_sysreqs (Debian package name: `^[a-z0-9][a-z0-9.+-]+$`)
- .validate_renv_version (semver-like, NULL allowed)
- .validate_lockfile (basename: alphanumerics + `._-`)
- .validate_renv_paths_cache (absolute path, no metacharacters)
- .validate_r_version (defensive depth on lockfile-derived value)
- .validate_scalar_logical (factored out from strict_install; used
  for use_pak too, closing a real injection vector via
  `echo "options(renv.config.pak.enabled = %s, ...)"`)

dock_from_desc() and dock_from_renv() call the relevant validators
at function entry, just after match.arg(github_pat).

The pr-reviewer agent flagged two additional vectors I missed in
the first pass: use_pak (B1, blocked) and names(repos) (B2, blocked
via the dput() backtick path). Both are closed by this commit.

Tests: 9 new expect_error() blocks across test-dock_from_desc.R
and test-dock_from_renv.R, each TDD red-against-buggy. One
existing test patched: renv_version = "banana" -> "1.2.3" since
the new validator rejects non-semver strings (the test was
exercising the install_version code path, which "1.2.3" still
covers).

devtools::test() -> [ FAIL 0 | WARN 3 | SKIP 0 | PASS 302 ].
R CMD check --as-cran: 0 errors / 0 warnings / 1 note (the
"unable to verify current time" host note, unrelated).

Out of scope for this PR (per the threat model agreed with the
maintainer):
- R6 helpers in R/add.R (dock$RUN, dock$COPY, ...) are
  author-controlled API; if the author writes a malicious RUN it
  is self-foot-shooting, not injection.
- distro is deprecated and ignored at runtime.
- github_pat is match.arg'd against a fixed enum.

NEWS.md gains a `## Security` section above `## New features`.
VincentGuyader added a commit that referenced this pull request May 6, 2026
…text

Closes the post-#106 security audit follow-up. Adds 9 .validate_*
helpers in R/utils.R (@nord) that reject inputs which would
otherwise be interpolated raw into the generated Dockerfile's shell
commands. Each helper raises stop() with a clear message naming the
offending parameter.

Validators added:

- .validate_FROM (image reference: `^[a-zA-Z0-9][a-zA-Z0-9._/-]*(:tag)?(@sha256:hex)?$`)
- .validate_AS (build-stage name: `^[a-zA-Z0-9][a-zA-Z0-9._-]*$`)
- .validate_repos (https URLs + simple-identifier names; closes the
  `dput()`-backtick injection vector on names)
- .validate_extra_sysreqs (Debian package name: `^[a-z0-9][a-z0-9.+-]+$`)
- .validate_renv_version (semver-like, NULL allowed)
- .validate_lockfile (basename: alphanumerics + `._-`)
- .validate_renv_paths_cache (absolute path, no metacharacters)
- .validate_r_version (defensive depth on lockfile-derived value)
- .validate_scalar_logical (factored out from strict_install; used
  for use_pak too, closing a real injection vector via
  `echo "options(renv.config.pak.enabled = %s, ...)"`)

dock_from_desc() and dock_from_renv() call the relevant validators
at function entry, just after match.arg(github_pat).

The pr-reviewer agent flagged two additional vectors I missed in
the first pass: use_pak (B1, blocked) and names(repos) (B2, blocked
via the dput() backtick path). Both are closed by this commit.

Tests: 9 new expect_error() blocks across test-dock_from_desc.R
and test-dock_from_renv.R, each TDD red-against-buggy. One
existing test patched: renv_version = "banana" -> "1.2.3" since
the new validator rejects non-semver strings (the test was
exercising the install_version code path, which "1.2.3" still
covers).

devtools::test() -> [ FAIL 0 | WARN 3 | SKIP 0 | PASS 302 ].
R CMD check --as-cran: 0 errors / 0 warnings / 1 note (the
"unable to verify current time" host note, unrelated).

Out of scope for this PR (per the threat model agreed with the
maintainer):
- R6 helpers in R/add.R (dock$RUN, dock$COPY, ...) are
  author-controlled API; if the author writes a malicious RUN it
  is self-foot-shooting, not injection.
- distro is deprecated and ignored at runtime.
- github_pat is match.arg'd against a fixed enum.

NEWS.md gains a `## Security` section above `## New features`.
VincentGuyader added a commit that referenced this pull request May 6, 2026
…text (#109)

* docs(NEWS): align renv_paths_cache bullet with post-#106 default

The renv_paths_cache bullet still described the pre-#106 behaviour
(default = "/root/.cache/R/renv"). After #106, the signature default
is NULL with auto-derivation from `user`: /root for user = NULL,
/home/<user> otherwise. Update the bullet to reflect the actual
runtime, closing the doc-runtime drift introduced by the merge.

No code change. No NAMESPACE / Rd impact.

* feat(security): validate user-supplied strings flowing into shell context

Closes the post-#106 security audit follow-up. Adds 9 .validate_*
helpers in R/utils.R (@nord) that reject inputs which would
otherwise be interpolated raw into the generated Dockerfile's shell
commands. Each helper raises stop() with a clear message naming the
offending parameter.

Validators added:

- .validate_FROM (image reference: `^[a-zA-Z0-9][a-zA-Z0-9._/-]*(:tag)?(@sha256:hex)?$`)
- .validate_AS (build-stage name: `^[a-zA-Z0-9][a-zA-Z0-9._-]*$`)
- .validate_repos (https URLs + simple-identifier names; closes the
  `dput()`-backtick injection vector on names)
- .validate_extra_sysreqs (Debian package name: `^[a-z0-9][a-z0-9.+-]+$`)
- .validate_renv_version (semver-like, NULL allowed)
- .validate_lockfile (basename: alphanumerics + `._-`)
- .validate_renv_paths_cache (absolute path, no metacharacters)
- .validate_r_version (defensive depth on lockfile-derived value)
- .validate_scalar_logical (factored out from strict_install; used
  for use_pak too, closing a real injection vector via
  `echo "options(renv.config.pak.enabled = %s, ...)"`)

dock_from_desc() and dock_from_renv() call the relevant validators
at function entry, just after match.arg(github_pat).

The pr-reviewer agent flagged two additional vectors I missed in
the first pass: use_pak (B1, blocked) and names(repos) (B2, blocked
via the dput() backtick path). Both are closed by this commit.

Tests: 9 new expect_error() blocks across test-dock_from_desc.R
and test-dock_from_renv.R, each TDD red-against-buggy. One
existing test patched: renv_version = "banana" -> "1.2.3" since
the new validator rejects non-semver strings (the test was
exercising the install_version code path, which "1.2.3" still
covers).

devtools::test() -> [ FAIL 0 | WARN 3 | SKIP 0 | PASS 302 ].
R CMD check --as-cran: 0 errors / 0 warnings / 1 note (the
"unable to verify current time" host note, unrelated).

Out of scope for this PR (per the threat model agreed with the
maintainer):
- R6 helpers in R/add.R (dock$RUN, dock$COPY, ...) are
  author-controlled API; if the author writes a malicious RUN it
  is self-foot-shooting, not injection.
- distro is deprecated and ignored at runtime.
- github_pat is match.arg'd against a fixed enum.

NEWS.md gains a `## Security` section above `## New features`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design: user + renv_paths_cache interaction in dock_from_renv()

2 participants