[Limiter] Persist rule book per partition via UpsertRuleBook log entry #4689
[Limiter] Persist rule book per partition via UpsertRuleBook log entry #4689tillrohrmann wants to merge 4 commits intorestatedev:mainfrom
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c4ea312adc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !diff.is_empty() { | ||
| self.action_collector.push(Action::RulesUpdated(diff)); |
There was a problem hiding this comment.
Seed limiter when leadership starts from persisted rule book
This only emits Action::RulesUpdated when a new UpsertRuleBook command is applied, but the commit also restores rule_book from FSM state at boot. In the common replay path, those rule-book updates are applied while the partition is follower/candidate (where actions are ignored), and when it later becomes leader the scheduler's UserLimiter remains empty unless another newer rule-book command arrives. That leaves existing limits unenforced after restart/failover until the next admin rule change; please seed the limiter from the restored book (for example diff_from_empty()) on leader activation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is something that we are going to add in #4690.
263f34f to
8680cdd
Compare
Introduces the cluster-global rule book that backs the in-memory `Rules` store of each partition processor's `UserLimiter`. This commit lands the foundational data layer for issue restatedev#4655 (steps 1 and 2 of the plan): * `RuleBook` / `PersistedRule` / `PersistedUserLimits` in `restate-limiter::rule_book`, bilrost-encoded, `Versioned`. The `RuleBook` is keyed by `RuleId` (xxh3-64 of the rule pattern's canonical display form) rendered as `rul_…` resource ids. A deliberate 64-bit hash is used instead of the 128-bit norm for rendered-id brevity; the doc comment captures the trade-off. * Soft-tombstone semantics: `PersistedRule.disabled: bool` defaults to `false` so an active rule is bilrost's empty state and gets omitted from the wire. * Writer logic: `RuleBook::apply_change` for `Create` / `Patch` / `Delete`, with the version-bump contract — create/recreate uses the new book version, runtime-relevant patches advance the per-rule version, reason-only edits bump only `last_modified` and the book version, no-ops bump nothing. Hard cap on total rules (`MAX_RULES_PER_BOOK`) — configurable knob comes later. * `RuleBook::diff`: presence + per-rule version drives `Vec<RuleUpdate>` for the runtime, with `disabled` rules treated as absent. `diff_from_empty` for bootstrap consumers. Supporting infra: * Adds `Rule("rul")` to `IdResourceType`; promotes `restate-types::id_util` and `base62_util` to public modules and `IdEncoder::{new,push_u64,push_u128}` to public so external crates can implement `ResourceId`. * Moves `UserLimits` and `RuleUpdate` from `restate-worker-api` into `restate-limiter` * Adds `generic-array` to the workspace deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Changes the runtime channel contract from
`ResourceManagerUpdate::RulesUpdated(RuleUpdate)` to
`RulesUpdated(Vec<RuleUpdate>)` so the upcoming partition-log
state-machine apply path (Step 4) can deliver the full diff between
two rule-book versions in a single message rather than fanning out
one channel send per `RuleUpdate`.
Pure refactor:
* `UserLimiter::apply_rule_update` becomes
`apply_rule_updates`, iterating the batch and accumulating the
union of vqueues to wake.
* The `ResourceManager::poll_resources` consumer arm passes the
`Vec` straight through.
* Three test call sites wrap their single update in `vec![…]`.
`Vec` is the right shape over `SmallVec`: rule changes are cold-path
(admin CRUD or bulk seeding), the per-message size penalty of an
inline buffer would outweigh the one heap allocation we'd save on
batch=1, and bulk seeding spills anyway.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the cluster-global rule book into the partition processor state
machine so leader-driven distribution (Step 5) has somewhere to land:
* `Command::UpsertRuleBook(UpsertRuleBook { partition_key_range,
rule_book: Bytes })` — new wal-protocol command. The payload is
bilrost-encoded `RuleBook` carried as opaque `Bytes` (precedent:
`Command::VQSchedulerDecisions`) so flexbuffers-based `Envelope`
serde does not need to drag full serde derive through every
limiter type.
* `ReadFsmTable::get_rule_book` / `WriteFsmTable::put_rule_book`
(both `*Since v1.7.0*`) backed by a new FSM slot `RULE_BOOK = 9`
in the partition store. Each partition writes the same logical
rule book; readback on PP boot gives leader transitions the right
state without an extra metadata-store round trip.
* `RuleBook` gains `StorageEncode`/`StorageDecode` (bilrost) plus
`bilrost_encode_to_bytes` / `bilrost_decode` helpers.
* `StateMachine` carries `rule_book: RuleBook` in-memory; loaded
from FSM table at PP boot. The `Command::UpsertRuleBook` apply
path bilrost-decodes the bytes, idempotency-checks the version
(skips when not strictly newer), diffs against the previous
in-memory book, persists via `put_rule_book` within the same
transaction, updates in-memory state, and emits
`Action::RulesUpdated(Vec<RuleUpdate>)` when non-empty.
* `Action::RulesUpdated` is dispatched in `leader_state` via a new
`SchedulerService::on_rules_updated` API that forwards the batch
through the existing resource-manager mpsc to `UserLimiter`.
Followers don't dispatch actions, matching the "only the leader's
UserLimiter is live" design.
* `restate-wal-protocol` does NOT depend on `restate-limiter` — the
opaque-bytes payload keeps it isolated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8680cdd to
779eb09
Compare
Wires the cluster-global rule book into the partition processor state
machine so leader-driven distribution has somewhere to land:
Command::UpsertRuleBook(UpsertRuleBook { partition_key_range, rule_book: Bytes })— new wal-protocol command. The payload isbilrost-encoded
RuleBookcarried as opaqueBytes(precedent:Command::VQSchedulerDecisions) so flexbuffers-basedEnvelopeserde does not need to drag full serde derive through every
limiter type.
ReadFsmTable::get_rule_book/WriteFsmTable::put_rule_book(both
*Since v1.7.0*) backed by a new FSM slotRULE_BOOK = 9in the partition store. Each partition writes the same logical
rule book; readback on PP boot gives leader transitions the right
state without an extra metadata-store round trip.
RuleBookgainsStorageEncode/StorageDecode(bilrost) plusbilrost_encode_to_bytes/bilrost_decodehelpers.StateMachinecarriesrule_book: RuleBookin-memory; loadedfrom FSM table at PP boot. The
Command::UpsertRuleBookapplypath bilrost-decodes the bytes, idempotency-checks the version
(skips when not strictly newer), diffs against the previous
in-memory book, persists via
put_rule_bookwithin the sametransaction, updates in-memory state, and emits
Action::RulesUpdated(Vec<RuleUpdate>)when non-empty.Action::RulesUpdatedis dispatched inleader_statevia a newSchedulerService::on_rules_updatedAPI that forwards the batchthrough the existing resource-manager mpsc to
UserLimiter.Followers don't dispatch actions, matching the "only the leader's
UserLimiter is live" design.
restate-wal-protocoldoes NOT depend onrestate-limiter— theopaque-bytes payload keeps it isolated.
This PR is based on #4677