Skip to content

feat(audit): destructive-op JSONL trail for hard-deletes#1069

Open
vincedk-alt wants to merge 1 commit into
garrytan:masterfrom
vincedk-alt:feat/destructive-op-audit
Open

feat(audit): destructive-op JSONL trail for hard-deletes#1069
vincedk-alt wants to merge 1 commit into
garrytan:masterfrom
vincedk-alt:feat/destructive-op-audit

Conversation

@vincedk-alt
Copy link
Copy Markdown

Summary

Every hard-delete of a source or page row now leaves a forensic JSONL trace at ~/.gbrain/audit/destructive-ops-YYYY-Www.jsonl (ISO-week rotation, override via GBRAIN_AUDIT_DIR). The next "what deleted X?" investigation becomes one grep, not forensic excavation across cron logs + shell history.

Sibling of shell-audit.ts (shell-job submissions) and rerank-audit.ts (reranker failures). Same naming convention, same env override, same best-effort posture.

Why

When an operator runs gbrain sources remove --confirm-destructive, or any destructive code path fires (autopilot purge phase, pages purge-deleted, source-archive cascade), the only existing record of "what got deleted, when, from where" lives in agent session transcripts. That works when the action happens inside an agent harness. It fails — silently and permanently — when the action happens from a terminal, a cron, or an MCP client that doesn't persist per-call.

Concrete trigger: a 2026-05-15 cleanup pass cascade-hard-deleted 229 pages via sources_remove default --confirm-destructive. Reconstructing "what just happened?" required manually grepping a downstream agent's session transcripts the next day. Had the operation happened from a terminal, the event would have been permanently unknowable.

This PR closes that bug class. Operator-facing CHANGELOG framing: "destructive ops now leave a JSONL trail so you can answer 'what did I delete last Tuesday?' without rebuilding from cron logs."

What changed

New src/core/destructive-audit.ts module (mirrors shell-audit.ts + rerank-audit.ts). Exports:

  • computeDestructiveAuditFilename(now) — pure ISO-8601 week naming
  • resolveAuditDir() — env-override-aware
  • logDestructiveOp(event) — best-effort append with stderr warning on write failure
  • readRecentDestructiveOps(days) — newest-first reader, tolerates malformed JSONL

Wired into five hard-delete sites:

Site Op kind
pglite-engine.ts:deletePage raw delete primitive
pglite-engine.ts:purgeDeletedPages autopilot purge phase + manual pages purge-deleted
postgres-engine.ts:deletePage same on Postgres
postgres-engine.ts:purgeDeletedPages same on Postgres
destructive-guard.ts:purgeExpiredSources source-level cascade

Intentionally NOT wired: softDeletePage. Soft-deletes are reversible within the 72h recovery window and don't lose data; auditing them would be operational noise.

purgeDeletedPages with zero rows purged also skips the audit line. The autopilot cycle runs the purge phase every cycle; writing "purged 0 pages" 24+ times per day on a clean brain is pure disk churn.

Page-slug truncation: when page_slugs.length > 50, the array is sliced to the first 50 with a page_slugs_truncated: true marker. The pages_purged count remains accurate as ground truth. Stops one bulk-delete of 10K rows from producing a 10K-string JSONL line.

Sample output

{\"ts\":\"2026-05-16T05:30:12Z\",\"op\":\"deletePage\",\"engine\":\"pglite\",\"slug\":\"wiki/people/alice\",\"source_id\":\"default\"}
{\"ts\":\"2026-05-16T05:35:21Z\",\"op\":\"purgeDeletedPages\",\"engine\":\"pglite\",\"older_than_hours\":72,\"pages_purged\":7,\"page_slugs\":[\"wiki/a\",\"wiki/b\",\"...\"]}
{\"ts\":\"2026-05-16T05:40:01Z\",\"op\":\"purgeExpiredSources\",\"engine\":\"pglite\",\"sources_purged\":1,\"source_ids\":[\"default\"]}

Operator forensic query: tail ~/.gbrain/audit/destructive-ops-$(date +%Y-W%V).jsonl answers "what got destroyed this week?" instantly.

Tests

test/destructive-audit.test.ts14 cases, 43 expect calls, all passing:

Pure helpers: ISO-week filename W20 + cross-year W01 boundary; resolveAuditDir env override + default.

Write+read roundtrip (tmpdir GBRAIN_AUDIT_DIR): single roundtrip; newest-first ordering; truncation at 50 with marker; <=50 untruncated; malformed JSONL skipping; days-window filter.

Best-effort posture: unwritable audit dir → no throw (op continues).

End-to-end through PGLite engine: engine.deletePage emits expected line; engine.purgeDeletedPages emits expected line with all slugs; zero-row purge emits NO line (no churn regression guard).

Regression: existing 100 pglite-engine.test.ts + 35 orphans.test.ts + 24 sources-ops.test.ts cases all still pass.

Verification

  • bun run typecheck — clean
  • bun run verify — clean
  • bun test test/destructive-audit.test.ts — 14 pass / 0 fail
  • bun test test/pglite-engine.test.ts — 100 pass / 0 fail
  • bun test test/orphans.test.ts — 35 pass / 0 fail
  • bun test test/sources-ops.test.ts — 24 pass / 0 fail

Reviewer notes

  • Privacy: the audit log writes slugs but never page content, frontmatter, or chunk text. The slug is the smallest identifier that lets an operator reconstruct what happened.
  • Disk pressure: a single hard-delete = ~150 bytes JSONL. Even at 1000 deletes/week that's 150KB/year per brain — trivial.
  • No new dependencies. Pure Node fs + path.
  • Sibling pattern proof: shell-audit.ts has been in production since v0.20.4+ with this exact write/rotation shape; rerank-audit.ts joined in v0.35.0.0. This PR is the third audit-log file using the same idiom.

Follow-ups (NOT in this PR)

  • Optional doctor check destructive_ops_summary that surfaces "N hard-deletes in last 7 days" — opt-in; the audit file alone is enough for operator forensics.
  • Caller-context fields (auth.client_id, remote flag) require threading OperationContext below the engine layer — separate concern. Current trace has enough information for the bug class motivating this PR.

🤖 Generated with Claude Code

Every hard-delete of a source or page row now leaves a forensic JSONL
trace at ~/.gbrain/audit/destructive-ops-YYYY-Www.jsonl (ISO-week
rotation, override via GBRAIN_AUDIT_DIR).

## Why

When an operator runs `gbrain sources remove default --confirm-destructive`,
or any destructive code path fires, the only existing record of "what
got deleted, when, and from where" lives in Claude Code session JSONL
transcripts. That works when the action happens via a Claude Code
session. It fails — silently and permanently — when the action happens
from a terminal, a cron, or an MCP client that doesn't log per-call.

Concrete example (2026-05-15 evening): a bulk `sources_remove` cascade
hard-deleted 229 pages. Reconstructing "what just happened?" required
manually grepping the Claude Code session transcripts the next day.
Had the operation happened in a terminal, the event would have been
permanently unknowable.

This PR closes that bug class. Every future destructive op leaves a
single JSONL line; the next "what deleted X?" investigation becomes
one grep, not a forensic excavation across cron logs + shell history.

## What changed

New `src/core/destructive-audit.ts` module — same naming convention,
same `GBRAIN_AUDIT_DIR` override, same best-effort posture as the
existing `shell-audit.ts` (shell-job submissions) and `rerank-audit.ts`
(reranker failures). Exports:

- `computeDestructiveAuditFilename(now)` — pure ISO-8601 week naming
- `resolveAuditDir()` — env-override-aware audit dir resolver
- `logDestructiveOp(event)` — best-effort append with stderr warning
  on write failure (disk-full attacker can't crash the destructive op
  itself; CHANGELOG calls this out as operational trace, not security)
- `readRecentDestructiveOps(days)` — newest-first reader, tolerates
  malformed JSONL, filters by ISO-week filename prefix

Wired into five hard-delete sites:

- `pglite-engine.ts:deletePage` — raw delete primitive
- `pglite-engine.ts:purgeDeletedPages` — autopilot purge phase + manual
- `postgres-engine.ts:deletePage` — same on Postgres
- `postgres-engine.ts:purgeDeletedPages` — same on Postgres
- `destructive-guard.ts:purgeExpiredSources` — source-level cascade

`softDeletePage` is intentionally NOT logged — soft-deletes are
reversible within the 72h recovery window and don't lose data. Only
operations that hard-delete data emit an audit line.

`purgeDeletedPages` with zero rows purged also skips the audit line —
the autopilot cycle runs the purge phase every cycle, and writing
"purged 0 pages" 24+ times per day on a clean brain is pure churn.

Page-slug list in audit events truncates at 50 (with a
`page_slugs_truncated: true` marker) so a single hard-delete of 10K
stale rows can't produce a 10K-string JSONL line. The `pages_purged`
count stays accurate as ground truth.

## Sample output

```jsonl
{"ts":"2026-05-16T05:30:12.456Z","op":"deletePage","engine":"pglite","slug":"wiki/people/alice","source_id":"default"}
{"ts":"2026-05-16T05:35:21.789Z","op":"purgeDeletedPages","engine":"pglite","older_than_hours":72,"pages_purged":7,"page_slugs":["wiki/a","wiki/b","wiki/c","wiki/d","wiki/e","wiki/f","wiki/g"]}
{"ts":"2026-05-16T05:40:01.123Z","op":"purgeExpiredSources","engine":"pglite","sources_purged":1,"source_ids":["default"]}
```

A future operator grep against `~/.gbrain/audit/destructive-ops-*.jsonl`
reconstructs every hard-delete in the last N weeks without needing the
Claude Code transcripts, terminal history, or cron logs.

## Tests

`test/destructive-audit.test.ts` — 14 cases (43 expect calls):

Pure helpers:
- ISO-week filename lands at correct W20 / cross-year boundary W01
- `resolveAuditDir` honors env override + defaults to gbrainPath('audit')

Write+read roundtrip (tmpdir GBRAIN_AUDIT_DIR):
- logDestructiveOp + readRecentDestructiveOps roundtrip
- Newest-first ordering across multiple writes
- page_slugs > 50 truncated with marker (count stays accurate)
- page_slugs <= 50 untruncated, no marker
- Malformed JSONL lines skipped (partial-write tolerance)
- Days-window filter

Best-effort posture:
- Audit dir unwritable → no throw (op continues)

End-to-end through PGLite engine:
- engine.deletePage emits the expected audit line
- engine.purgeDeletedPages emits the expected line with all slugs
- engine.purgeDeletedPages with zero rows = NO audit line (no churn)

Regression: existing 100 pglite-engine.test.ts + 35 orphans.test.ts +
24 sources-ops.test.ts cases all still pass.

## Verification

- `bun run typecheck` — clean
- `bun run verify` — 5-check gate clean
- `bun test test/destructive-audit.test.ts` — 14 pass / 0 fail
- `bun test test/pglite-engine.test.ts` — 100 pass / 0 fail
- `bun test test/orphans.test.ts` — 35 pass / 0 fail
- `bun test test/sources-ops.test.ts` — 24 pass / 0 fail

## Open follow-ups (not in this PR)

- Doctor check `destructive_ops_summary` that surfaces "N hard-deletes
  in the last 7 days" — opt-in; the audit file alone is enough for
  operator forensics.
- Caller context (auth.client_id, remote flag) requires threading
  OperationContext below the engine layer — separate concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant