Skip to content

fix(provision): never block startup on a missing fduty CLI; stage it on binary-only installs#71

Merged
ysyneu merged 1 commit into
feat/ai-srefrom
feat/fduty-provision-nonfatal
Jun 15, 2026
Merged

fix(provision): never block startup on a missing fduty CLI; stage it on binary-only installs#71
ysyneu merged 1 commit into
feat/ai-srefrom
feat/fduty-provision-nonfatal

Conversation

@ysyneu

@ysyneu ysyneu commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Problem

A runner refused to start when the fduty CLI was absent — most visibly on macOS, where flashduty-runner serve died immediately with fduty CLI not ready right after install.sh reported "Already at vX, nothing to do."

Two coupled defects:

  1. The startup gate was fatal. ensureFdutyCLI() error aborted run/serve when fduty couldn't be provisioned/verified.
  2. install.sh never staged fduty on binary-only / darwin installs. install_bundled_fduty sat after the early return on the darwin/--no-service path, so the fduty bundled inside the release archive was extracted but never placed. Re-running install.sh didn't help either: the "already up to date" short-circuit returned before staging.

Net on macOS: no fduty anywhere the runtime looks → the (then-fatal) gate bricked startup.

Fix

Gate is now non-fatal (cmd/provision.go, serve.go, main.go). ensureFdutyCLI() still best-effort auto-stages fduty and self-checks it on the bash PATH, but on failure it logs an actionable manual-install hint and the runner starts anyway. A runner can still do non-fduty work, and a missing CLI is far easier to fix on a running, loudly-logging runner than on one that won't boot. fduty calls 127 until the operator resolves it — surfaced, not silently fatal.

install.sh staging fixed. install_bundled_fdutystage_bundled_fduty(dir), now called on both paths:

  • service installs → $BIN_DIR (env-pinned runtime tools dir)
  • binary-only / darwin → $INSTALL_DIR (on the user's PATH, and the dir darwin's bundledFdutyNextToExe() probes — os.Executable() there returns the /usr/local/bin symlink, not its target; verified empirically)

The "already up to date" short-circuit now also requires fduty_present (mode-aware) so a re-run self-heals a missing fduty instead of declaring success while the CLI is absent.

Verification

  • Live: built the runner, ran run with fduty fully scrubbed from PATH + an empty bin dir → it logs the install hint and proceeds through workspace initializedhealth server listeningconnecting to Flashduty (only exits on the bogus WS URL, not on fduty). Before this change it aborted right after "starting".
  • Unit: new TestEnsureFdutyCLI_NeverFatalWhenUnprovisionable (deterministic missing-fduty branch via scrubbed PATH); extracted-function tests for stage_bundled_fduty + fduty_present (stage-to-INSTALL_DIR, stage-to-BIN_DIR, present/absent, no-fduty-member no-op).
  • go build ./..., go vet ./..., go test ./..., gofumpt -l, shellcheck -s sh install.sh — all green.

Notes

  • The building blocks provisionFduty / verifyFdutyOnPath keep their error-returning contracts (still unit-tested); only the ensureFdutyCLI integration point swallows the error into a log.
  • The systemd/service install path is unchanged in behavior.

…on binary-only installs

Two coupled defects made a runner refuse to start when the `fduty` CLI was
absent — most visibly on macOS, where it failed immediately with
"fduty CLI not ready" after install.sh reported "Already at vX, nothing to do".

1. Startup gate was fatal. `ensureFdutyCLI` returned an error that aborted
   run/serve when fduty couldn't be provisioned or verified. A runner can still
   do non-fduty work, and a missing CLI is far easier to fix on a running,
   loudly-logging runner than on one that won't boot. It is now best-effort:
   it still auto-stages and self-checks fduty, but on failure it logs an
   actionable manual-install hint and the runner starts anyway. fduty calls
   127 until the operator resolves it — surfaced, not silently fatal.

2. install.sh never staged the bundled fduty on binary-only / darwin installs.
   `install_bundled_fduty` sat after the early `return` taken on the
   darwin/--no-service path, so the fduty shipped inside the release archive
   was extracted but never placed. Renamed to `stage_bundled_fduty(dir)` and
   now called on both paths: service installs stage into $BIN_DIR (the
   env-pinned runtime tools dir), binary-only/darwin into $INSTALL_DIR (on the
   user's PATH, and the dir darwin's bundledFdutyNextToExe() probes since
   os.Executable() there returns the /usr/local/bin symlink, not its target).
   The "already up to date" short-circuit now also requires `fduty_present`
   (mode-aware) so a re-run self-heals a missing fduty instead of declaring
   success while the CLI is absent.

Verified: runner boots through to the WS-connect stage with fduty fully absent
(no abort); install.sh staging + short-circuit logic unit-tested; go
build/vet/test, gofumpt, shellcheck all green.
@ysyneu ysyneu merged commit ef7d702 into feat/ai-sre Jun 15, 2026
7 of 8 checks passed
@ysyneu ysyneu deleted the feat/fduty-provision-nonfatal branch June 15, 2026 09:46
ysyneu added a commit that referenced this pull request Jun 15, 2026
release(v0.0.23): fduty startup gate non-fatal + install.sh binary-only staging (#71)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant