Skip to content

feat(pegboard): actor startup KV preloading#4452

Open
NathanFlurry wants to merge 39 commits intomainfrom
ralph/actor-startup-kv-preload
Open

feat(pegboard): actor startup KV preloading#4452
NathanFlurry wants to merge 39 commits intomainfrom
ralph/actor-startup-kv-preload

Conversation

@NathanFlurry
Copy link
Member

Summary

  • Eliminate KV round-trips during actor startup by pre-fetching required keys from FoundationDB and delivering them alongside the CommandStartActor protocol message (v8)
  • Preloads persist data, connections, inspector token, queue metadata, workflow data, and SQLite VFS chunks in a single FDB snapshot transaction
  • Adds PreloadMap interface with binary search for efficient subsystem consumption, WriteCollector for batching new-actor initialization writes, and #expectNoKvRoundTrips detection flag
  • Configurable per-deployment limits (1 MB global, 768 KB SQLite, 128 KB workflows, 64 KB connections) with per-actor overrides via actor config options

Implementation

Engine (Rust)

  • US-001: Preload config fields in engine/packages/config/src/config/pegboard.rs
  • US-002: v8 runner protocol schema with PreloadedKv type and v7/v8 converters
  • US-003: batch_preload() function in actor_kv/preload.rs with single FDB snapshot transaction
  • US-004: Populate preloaded KV at send time in InsertAndSendCommands activity (not persisted in workflow history)
  • US-005: Runner stores preload config from prepopulateActorNames metadata

TypeScript (RivetKit)

  • US-006: Parse preloaded KV in runner SDK from v8 protocol
  • US-007: Per-actor preload size overrides in actor config schema
  • US-008: PreloadMap with binary search (no hex string conversion)
  • US-009: Subsystems consume PreloadMap before issuing KV reads
  • US-010: SQLite VFS preload integration with partial preload support
  • US-011: Wire preloaded data into ActorInstance.start()
  • US-012: WriteCollector batches new-actor init writes into single kvBatchPut
  • US-013: #expectNoKvRoundTrips flag warns on unexpected KV reads during startup
  • US-014: Eliminate redundant persist data read in engine driver

Bugfixes (from adversarial review)

  • US-016: Fix requested_prefixes/requested_get_keys including skipped entries (critical: prevented silent data loss for actors exceeding preload budgets)
  • US-017: Fix entry_size to include metadata in budget calculation
  • US-018: Call clearPreload() on SQLite VFS store after migration (768 KB memory leak fix)
  • US-019: Remove dead actor_preload_configs field from runner conn
  • US-020: Avoid building PreloadMap twice from same data

Docs

  • US-015: Updated limits.mdx and configuration.mdx with preload limits

Test plan

  • File system driver tests pass (pnpm test driver-file-system)
  • Engine driver tests pass (pnpm test driver-engine)
  • cargo build -p pegboard succeeds
  • cargo build -p rivet-runner-protocol succeeds
  • cargo build -p rivet-pegboard-runner succeeds
  • Verify actor startup with preloaded data (new and existing actors)
  • Verify actor startup without preloaded data (v7 engine fallback)
  • Verify connections/workflows exceeding budget fall back to KV reads
  • Verify SQLite partial preload with KV fallback on miss

🤖 Generated with Claude Code

@railway-app
Copy link

railway-app bot commented Mar 18, 2026

🚅 Deployed to the rivet-pr-4452 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-inspector 😴 Sleeping (View Logs) Web Mar 22, 2026 at 1:09 am
frontend-cloud 😴 Sleeping (View Logs) Web Mar 18, 2026 at 6:42 am
website ❌ Build Failed (View Logs) Web Mar 18, 2026 at 6:22 am
mcp-hub ✅ Success (View Logs) Web Mar 18, 2026 at 6:21 am
ladle ❌ Build Failed (View Logs) Web Mar 18, 2026 at 6:20 am

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 18, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4452

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4452

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4452

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4452

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4452

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4452

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4452

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4452

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4452

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4452

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4452

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4452

commit: 27824e8

@claude
Copy link

claude bot commented Mar 18, 2026

PR Review: feat(pegboard): actor startup KV preloading

This is a well-architected optimization. The three-way preload return semantics (Uint8Array / null / undefined), budget tracking in a single FDB snapshot transaction, and WriteCollector batching are solid design choices. The adversarial review bugfixes (US-016 through US-020) show thorough self-review. Some issues to address:


Bug: get-key budget check adds key to requested_get_keys before value-size check

File: engine/packages/pegboard/src/actor_kv/preload.rs

In the get-keys loop, a key is pushed to requested_get_keys unconditionally before checking whether the entry value fits within the budget. If a key exists in FDB but its value exceeds the remaining budget, the key lands in requested_get_keys but not in entries. On the TypeScript side, preload.get(KEY) returns null ("requested but not found"), causing the subsystem to treat it as absent instead of falling back to KV. For PERSIST_DATA specifically, this would start the actor with missing state.

Fix: only add to requested_get_keys after confirming the entry fits budget. Keys absent from FDB (builder stays None) should still be recorded since "not found" is a valid preloaded answer:

if let Some(b) = builder {
    let (k, v, m) = b.build()?;
    let size = entry_size(&k, &v, &m);
    if total_bytes + size <= max_total_bytes {
        total_bytes += size;
        requested_get_keys.push(key.clone()); // only after confirming it fits
        entries.push(...);
    }
    // else: not in requested_get_keys, actor falls back to KV (correct)
} else {
    // Key not found in FDB; record as scanned-but-absent.
    requested_get_keys.push(key.clone());
}

Minor: unwrap_or_default() silently uses empty actor name

File: engine/packages/pegboard/src/workflows/actor/runtime.rs, insert_and_send_commands

The .unwrap_or_default() call when extracting the actor name from commands silently falls back to an empty string. This is guarded by has_start_cmd above, so find_map should always succeed. An empty actor name produces wrong preload config with no error. Prefer an explicit error:

.ok_or_else(|| anyhow::anyhow!("CommandStartActor not found after has_start_cmd check"))?,

Nit: expectNoKvRoundTrips only warns once per startup

File: rivetkit-typescript/packages/rivetkit/src/actor/instance/mod.ts

The warnUnexpectedKvRoundTrip method resets the flag to false after the first warning, which silences all subsequent unexpected KV round-trips during startup. If multiple subsystems fall back to KV (e.g., after a preload regression), only the first will be visible in logs. Consider logging without resetting, or deduplicating by method name.


Nit: .ralph/ planning artifacts in the repo

Files: .ralph/prd.json, .ralph/progress.txt, scripts/ralph/progress.txt

These look like AI-assisted planning artifacts. Unless the team intentionally tracks them, they probably belong in .gitignore rather than committed to source.


Minor: prefix_range trailing-zero assumption

File: engine/packages/pegboard/src/actor_kv/preload.rs

The prefix_range function silently no-ops if the trailing 0 byte is absent when popping from the packed key. A silent no-op produces an incorrect range. If the null terminator is guaranteed by tuple encoding, an assertion would catch regressions immediately:

assert_eq!(start.pop(), Some(0), "expected null terminator from tuple encoding");

Overall

The implementation is solid. The critical item is the get-key budget / requested_get_keys tracking bug - the rest are minor. The version migration plumbing (v7 to v8 converters, backward-compat preloaded_kv: None for old runners) is correct, and the missed-commands reconnect path injecting fresh preloaded KV is an important correctness detail that's handled well.

@claude
Copy link

claude bot commented Mar 18, 2026

Code Review: feat(pegboard): actor startup KV preloading

This is a substantial and well-implemented feature. The adversarial review pass (US-016–US-020) already caught and fixed several critical correctness issues. Here are remaining observations.


Issues

1. Fragile prefix range construction in preload.rs

The prefix_range function silently no-ops if the last byte of the packed key is not 0:

if let Some(&0) = start.last() {
    start.pop();
}

If the tuple encoding changes or an empty prefix is passed, the computed range is wrong with no error or assertion. Consider asserting the byte is present or returning a Result.

2. Preloaded KV fetch can fail actor startup with no fallback

fetch_preloaded_kv is documented as "no silent fallback", so a transient FDB error during preloading fails the entire CommandStartActor. Since preloading is a pure optimization, falling back to the existing non-preloaded startup path on FDB error would improve resilience without risking correctness.

3. #expectNoKvRoundTrips silences after first warning

If a subsystem makes multiple unexpected KV reads, only the first is reported (the flag is cleared after the first hit). This makes the detection less useful when multiple regressions are introduced at once. Consider logging all occurrences during startup and clearing the flag only at the end of start().

4. preloaded_kv cloned per command in insert_and_send_commands

preloaded_kv can be up to 1 MB. The clone is correct (computed once and reused), but if multiple CommandStartActor commands were ever batched together this would clone 1 MB per command. A comment explaining that only one start command is expected per batch would help future reviewers.

5. Missed-command preload in conn.rs reads metadata once per command

fetch_preloaded_kv internally reads the actor name metadata from FDB before calling batch_preload. In conn.rs this runs inside a for loop over missed commands, so the metadata key is re-read once per missed CommandStartActor. Unlikely in practice but worth a comment explaining why this is acceptable.

6. clearPreload() not called on error paths in db/mod.ts and db/drizzle/mod.ts

kvStoreRef?.clearPreload() is called in onReady (after migrations) but not if createClient throws. If migrations fail, preloaded entries remain in memory until GC. Not a correctness issue but deterministic cleanup would be cleaner.


Positive observations

  • Three-way return semantics on PreloadMap.get() (Uint8Array / null / undefined) is the right design. The requested_get_keys/requested_prefixes tracking in Rust correctly ensures actors never mistake "not scanned" for "scanned and not found" -- the US-016 fix is critical.

  • partial: true for SQLite, partial: false for connections/workflows is semantically correct. SQLite VFS can handle individual cache misses; a partial connection list would give an inconsistent view.

  • WriteCollector cleanly batches 3 sequential init writes into 1. The fallback to individual kvBatchPut when no collector is provided maintains backward compatibility.

  • Protocol versioning looks correct: PROTOCOL_MK2_VERSION is bumped in both Rust and TypeScript, and the v7-to-v8 converters handle preloaded_kv: None for older runners properly.

  • Hardcoded byte prefixes in build_startup_preload_params are a maintenance risk. The CLAUDE.md guidance is thorough, but a compile-time or test-time assertion comparing them to the TypeScript key constants would catch out-of-sync regressions before they reach production.

  • Test coverage for PreloadMap and WriteCollector is solid. An integration test asserting #expectNoKvRoundTrips never fires during a standard actor startup would make the optimization regression-proof going forward.


Minor

  • preloadedKvStr in stringify.ts checks === null but the TypeScript type is PreloadedKv | undefined. At runtime (bare deserialization) both may appear; the check is practically correct but tightening the type would remove the ambiguity.
  • StreamingMode::WantAll for exact get-key lookups and StreamingMode::Iterator for prefix scans are the right choices for their respective access patterns.

- Add unit tests for PreloadMap (29 tests) and WriteCollector (4 tests)
- Use symbol for warnUnexpectedKvRoundTrip to keep it internal
- Add sync comments between Rust preload.rs and TS keys.ts
- Add CLAUDE.md rule about keeping prefix bytes in sync
- Remove ralph artifacts and unrelated error json
- Add .gitignore entries for ralph generated files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant