feat(knowledge): scope-bounded reconcile prune, atomic skill install, 16MiB envd unary cap#67
Merged
Conversation
… 16MiB envd unary cap Three stability fixes for the knowledge/skill sync layer (PR-3 of the knowledge-loading-stability series): - ReconcileKnowledgeManifest orphan prune is now scope-bounded. Args gain staged_scopes (manifest authoritative inside; orphans pruned) and valid_scopes (all scopes account-wide; on-disk scopes outside are deleted-pack residue, pruned whole; valid-but-unstaged scopes belong to other sessions sharing the runner and are kept). Fixes the over-cap shared-BYOC cross-session prune thrash where two sessions with different mounted teams evicted each other's files. Empty valid_scopes (older Safari, or zero-pack account) keeps the legacy global prune. - SyncSkill install is atomic: archive + .checksum extract into a sibling .installing-* staging dir, then RemoveAll+Rename swap, serialized by a per-environment mutex; orphaned staging dirs from hard crashes are swept before each install. A corrupt zip no longer destroys the previously installed version mid-RemoveAll. - envd unary body cap raised 8MiB to 16MiB (sync_skill zip_data at Safari's MaxSkillZipBytes 10MiB is ~13.3MiB after base64; it could not fit the old frame, so >6MiB-raw skills were uninstallable on cloud sandboxes). Over-cap requests now fail with an explicit cap error instead of a silent LimitReader truncation surfacing as a cryptic JSON decode failure.
…irTemp+Chmod Installs are mutex-serialized and leftovers are swept pre-install, so a random suffix bought nothing; CI's gosec also rejected the Chmod (G302) that MkdirTemp's 0700 forced.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR-3 of the knowledge/skills loading-stability series (safari side: flashcatcloud/fc-safari#170 merged, #172 open).
What
Scope-bounded reconcile prune —
ReconcileKnowledgeManifestArgsgainsstaged_scopes+valid_scopes. Orphans are pruned only inside scopes this manifest staged; on-disk scopes outsidevalid_scopes(deleted packs) are pruned whole; scopes valid account-wide but unstaged by this session are left untouched. Fixes the shared-BYOC cross-session prune thrash observed live during #172's over-cap rigor pass (two sessions with different mounted teams evicted each other's files every reconcile). Emptyvalid_scopes(older Safari, or a zero-pack account) keeps the legacy global prune — both directions of mixed-version fleet are safe, no deploy ordering.Atomic skill install —
SyncSkillextracts archive +.checksuminto a sibling.installing-*staging dir and swaps viaRemoveAll+Rename, serialized by a per-environment mutex; orphaned staging dirs from hard crashes are swept before each install. A corrupt zip no longer destroys the previously installed version (old behavior:RemoveAllfirst, then extract into the live dir).envd unary cap 8 MiB → 16 MiB — sync_skill
zip_dataat Safari'sMaxSkillZipBytes(10 MiB) is ~13.3 MiB after base64 and could not fit the old frame, so >6 MiB-raw skills were uninstallable on cloud sandboxes. Over-cap bodies now fail with an explicit cap error instead of silentLimitReadertruncation surfacing as a cryptic JSON decode failure. Companion safari commit pins the relationship withTestSkillZipFitsEnvdUnaryFrameand adds a cloud-dispatch pre-check.Verification
go test ./...green,go vetclean, gofumpt clean.knowledge/team_999999/junk.md(invalid scope) +knowledge/account/orphan-live.md→ reconcilekept=7 total_pruned=2 eager_staged=12, both planted files gone, all real files intact.accountonly, valid=all 5 scopes): planted orphan in a valid-but-unstaged team scope survived alongside all 13 team files (total_pruned=1= the account orphan only). Old global prune would have deleted all 14 — that was the live thrash.Rollout