Upcoming Mantis 5.x: Reservation-Based Scheduling for Resource Clusters #827
Andyz26
announced in
Announcements
Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Mantis 5.x: Reservation-Based Scheduling for Resource Clusters
Hey everyone,
We're sharing an early heads-up on a significant scheduling change landing in Mantis 5.x. This post covers what's changing, why, and what you should be aware of when upgrading.
TL;DR
The resource cluster scheduler is moving from a fire-and-forget allocation model to a reservation-based, priority-queued model. This is enabled by default in 5.x.
What's Changing
Today, when a job needs workers, the
JobActorasks the scheduler to immediately find and assign aTaskExecutor. If nothing is available, the request fails and retries are driven by heartbeat timeouts. This works, but it breaks down at scale:The new model
In 5.x, scheduling requests become reservations that enter a prioritized queue per resource SKU. A
ReservationRegistryActorprocesses these queues on a periodic tick (default: 1s), dispatching batch allocation requests to anExecutorStateManagerActorone constraint group at a time.Key properties:
REPLACE(highest) >SCALE>NEW_JOB(lowest), with job tier and FIFO as tiebreakers. Worker replacements always get resources before new low-priority jobs.effectiveIdleCount = idle - pendingReservations, preventing premature scale-down and enabling proactive scale-up.Impact and Upgrade Notes
Feature flag (opt-out fallback)
Reservation scheduling is controlled by
MasterConfiguration.isReservationSchedulingEnabled()and defaults totruein 5.x. To fall back to legacy behavior:When disabled, the system uses the previous
ResourceClusterAwareSchedulerwith direct scheduling. The flag is global (applies to all resource clusters). However due to the large scope of all changed components the legacy mode will not guarantee the exact same behavior thus we don't recommend running in legacy mode in production. Please remain on 4.x until you are ready to upgrade to the new mode. We recommend running with the flag enabled in a staging/test environment before rolling out to production.Breaking changes
MantisSchedulerinterface: the newResourceClusterReservationAwareSchedulerreportsschedulerHandlesAllocationRetries() = true. If you have customMantisSchedulerimplementations, note thatscheduleWorkers()andunscheduleWorker()now throwUnsupportedOperationExceptionin the reservation-aware path. The entry points areupsertReservation()andcancelReservation()instead.ResourceClusterActorrefactored: executor state management, disabled executor tracking, and assignment logic have been extracted into child actors (ExecutorStateManagerActor,AssignmentHandlerActor). If you have custom code that directly interacts withResourceClusterActorinternals or extends its message protocol, you'll need to adapt.GetReservationAwareClusterUsageRequestinstead ofGetClusterUsageRequest. Custom scaler integrations that rely on the old request/response types should be updated.No breaking changes if...
Recovery behavior change
Reservation state is in-memory only (not persisted). On master failover:
JobActorscans its stages for workers inAcceptedstate and re-submits them viaupsertReservation(), rebuilding the queue.This means there is a brief window after failover where pending reservations don't exist yet. In practice, this window is bounded by leader election time + job cluster re-initialization time. If you have tight SLAs around worker replacement latency during master failover, test this path.
Autoscaler tuning
If you've tuned your
ResourceClusterScalerActorthresholds (maxIdleToKeep,minIdleToKeep, cooldown periods), be aware:effectiveIdleCount = idle - pendingReservations. Machines about to be used by pending reservations won't be scaled down.step = pendingReservations + minIdleToKeep - actualIdleCount.You may want to revisit your idle thresholds after enabling reservation scheduling, as the scaler is now more conservative about scale-down and more aggressive about scale-up.
Known Limitations (5.x initial release)
JobActorre-initialization (see above). A future release may add durable reservation state.ReservationRegistryActorandExecutorStateManagerActorcan transiently see stale data. Mitigated by the conservative scale-down cooldown.CancelReservation, its reservations remain in memory indefinitely. A TTL/sweep mechanism is planned for a follow-up.New Components at a Glance
ReservationRegistryActorExecutorStateManagerActorAssignmentHandlerActorResourceClusterReservationAwareSchedulerMantisSchedulerimpl that delegates to reservation APIsFeedback Welcome
We'd love to hear from anyone running Mantis at scale:
Please share your thoughts in this thread. If you run into issues during upgrade, open an issue or use this thread to let us know.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions