Skip to content

[Limiter] Persist rule book per partition via UpsertRuleBook log entry #4689

Open
tillrohrmann wants to merge 4 commits intorestatedev:mainfrom
tillrohrmann:issues/4655-2
Open

[Limiter] Persist rule book per partition via UpsertRuleBook log entry #4689
tillrohrmann wants to merge 4 commits intorestatedev:mainfrom
tillrohrmann:issues/4655-2

Conversation

@tillrohrmann
Copy link
Copy Markdown
Contributor

@tillrohrmann tillrohrmann commented May 5, 2026

Wires the cluster-global rule book into the partition processor state
machine so leader-driven distribution has somewhere to land:

  • Command::UpsertRuleBook(UpsertRuleBook { partition_key_range, rule_book: Bytes }) — new wal-protocol command. The payload is
    bilrost-encoded RuleBook carried as opaque Bytes (precedent:
    Command::VQSchedulerDecisions) so flexbuffers-based Envelope
    serde does not need to drag full serde derive through every
    limiter type.

  • ReadFsmTable::get_rule_book / WriteFsmTable::put_rule_book
    (both *Since v1.7.0*) backed by a new FSM slot RULE_BOOK = 9
    in the partition store. Each partition writes the same logical
    rule book; readback on PP boot gives leader transitions the right
    state without an extra metadata-store round trip.

  • RuleBook gains StorageEncode/StorageDecode (bilrost) plus
    bilrost_encode_to_bytes / bilrost_decode helpers.

  • StateMachine carries rule_book: RuleBook in-memory; loaded
    from FSM table at PP boot. The Command::UpsertRuleBook apply
    path bilrost-decodes the bytes, idempotency-checks the version
    (skips when not strictly newer), diffs against the previous
    in-memory book, persists via put_rule_book within the same
    transaction, updates in-memory state, and emits
    Action::RulesUpdated(Vec<RuleUpdate>) when non-empty.

  • Action::RulesUpdated is dispatched in leader_state via a new
    SchedulerService::on_rules_updated API that forwards the batch
    through the existing resource-manager mpsc to UserLimiter.
    Followers don't dispatch actions, matching the "only the leader's
    UserLimiter is live" design.

  • restate-wal-protocol does NOT depend on restate-limiter — the
    opaque-bytes payload keeps it isolated.

This PR is based on #4677

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@tillrohrmann
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4ea312adc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +753 to +754
if !diff.is_empty() {
self.action_collector.push(Action::RulesUpdated(diff));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Seed limiter when leadership starts from persisted rule book

This only emits Action::RulesUpdated when a new UpsertRuleBook command is applied, but the commit also restores rule_book from FSM state at boot. In the common replay path, those rule-book updates are applied while the partition is follower/candidate (where actions are ignored), and when it later becomes leader the scheduler's UserLimiter remains empty unless another newer rule-book command arrives. That leaves existing limits unenforced after restart/failover until the next admin rule change; please seed the limiter from the restored book (for example diff_from_empty()) on leader activation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that we are going to add in #4690.

@tillrohrmann tillrohrmann requested a review from AhmedSoliman May 5, 2026 10:08
@tillrohrmann tillrohrmann force-pushed the issues/4655-2 branch 2 times, most recently from 263f34f to 8680cdd Compare May 5, 2026 12:59
tillrohrmann and others added 4 commits May 5, 2026 16:22
Introduces the cluster-global rule book that backs the in-memory `Rules`
store of each partition processor's `UserLimiter`. This commit lands the
foundational data layer for issue restatedev#4655 (steps 1 and 2 of the plan):

  * `RuleBook` / `PersistedRule` / `PersistedUserLimits` in
    `restate-limiter::rule_book`, bilrost-encoded, `Versioned`. The
    `RuleBook` is keyed by `RuleId` (xxh3-64 of the rule pattern's
    canonical display form) rendered as `rul_…` resource ids. A
    deliberate 64-bit hash is used instead of the 128-bit norm for
    rendered-id brevity; the doc comment captures the trade-off.

  * Soft-tombstone semantics: `PersistedRule.disabled: bool` defaults
    to `false` so an active rule is bilrost's empty state and gets
    omitted from the wire.

  * Writer logic: `RuleBook::apply_change` for `Create` / `Patch` /
    `Delete`, with the version-bump contract — create/recreate uses
    the new book version, runtime-relevant patches advance the
    per-rule version, reason-only edits bump only `last_modified` and
    the book version, no-ops bump nothing. Hard cap on total rules
    (`MAX_RULES_PER_BOOK`) — configurable knob comes later.

  * `RuleBook::diff`: presence + per-rule version drives
    `Vec<RuleUpdate>` for the runtime, with `disabled` rules treated
    as absent. `diff_from_empty` for bootstrap consumers.

Supporting infra:

  * Adds `Rule("rul")` to `IdResourceType`; promotes
    `restate-types::id_util` and `base62_util` to public modules and
    `IdEncoder::{new,push_u64,push_u128}` to public so external crates
    can implement `ResourceId`.

  * Moves `UserLimits` and `RuleUpdate` from `restate-worker-api` into
    `restate-limiter`

  * Adds `generic-array` to the workspace deps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Changes the runtime channel contract from
`ResourceManagerUpdate::RulesUpdated(RuleUpdate)` to
`RulesUpdated(Vec<RuleUpdate>)` so the upcoming partition-log
state-machine apply path (Step 4) can deliver the full diff between
two rule-book versions in a single message rather than fanning out
one channel send per `RuleUpdate`.

Pure refactor:

  * `UserLimiter::apply_rule_update` becomes
    `apply_rule_updates`, iterating the batch and accumulating the
    union of vqueues to wake.
  * The `ResourceManager::poll_resources` consumer arm passes the
    `Vec` straight through.
  * Three test call sites wrap their single update in `vec![…]`.

`Vec` is the right shape over `SmallVec`: rule changes are cold-path
(admin CRUD or bulk seeding), the per-message size penalty of an
inline buffer would outweigh the one heap allocation we'd save on
batch=1, and bulk seeding spills anyway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the cluster-global rule book into the partition processor state
machine so leader-driven distribution (Step 5) has somewhere to land:

  * `Command::UpsertRuleBook(UpsertRuleBook { partition_key_range,
    rule_book: Bytes })` — new wal-protocol command. The payload is
    bilrost-encoded `RuleBook` carried as opaque `Bytes` (precedent:
    `Command::VQSchedulerDecisions`) so flexbuffers-based `Envelope`
    serde does not need to drag full serde derive through every
    limiter type.

  * `ReadFsmTable::get_rule_book` / `WriteFsmTable::put_rule_book`
    (both `*Since v1.7.0*`) backed by a new FSM slot `RULE_BOOK = 9`
    in the partition store. Each partition writes the same logical
    rule book; readback on PP boot gives leader transitions the right
    state without an extra metadata-store round trip.

  * `RuleBook` gains `StorageEncode`/`StorageDecode` (bilrost) plus
    `bilrost_encode_to_bytes` / `bilrost_decode` helpers.

  * `StateMachine` carries `rule_book: RuleBook` in-memory; loaded
    from FSM table at PP boot. The `Command::UpsertRuleBook` apply
    path bilrost-decodes the bytes, idempotency-checks the version
    (skips when not strictly newer), diffs against the previous
    in-memory book, persists via `put_rule_book` within the same
    transaction, updates in-memory state, and emits
    `Action::RulesUpdated(Vec<RuleUpdate>)` when non-empty.

  * `Action::RulesUpdated` is dispatched in `leader_state` via a new
    `SchedulerService::on_rules_updated` API that forwards the batch
    through the existing resource-manager mpsc to `UserLimiter`.
    Followers don't dispatch actions, matching the "only the leader's
    UserLimiter is live" design.

  * `restate-wal-protocol` does NOT depend on `restate-limiter` — the
    opaque-bytes payload keeps it isolated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant