generated from PoCInnovation/open-source-project-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
This is a massive feature but it would be incredible to implement.
The idea here is a standard master/slave system. Where you would be able to deploy a distribox slave instance on another machine, reference it in your master node and be able to deploy VMs on the slave.
Distribox Master/Slave Architecture Specification
This document outlines the design and implementation plan for transitioning Distribox from a single-node setup to a
distributed Master/Slave architecture.
1. Architecture Overview
The system will be split into two main components:
A. Master Node (Orchestrator)
- Role: Central point of truth and user interaction.
- Responsibilities:
- User authentication and authorization (RBAC).
- Storing the global state of all Slaves and VMs.
- Scheduling: Deciding which Slave should host a new VM.
- Proxying commands to Slaves.
- Aggregating resource usage and telemetry.
- Exposing the API for the Frontend.
- Dependencies: PostgreSQL (via SQLModel/SQLAlchemy), FastAPI.
B. Slave Node (Agent)
- Role: Local resource provider and VM executor.
- Responsibilities:
- Direct interaction with local
libvirtandqemu-img. - Managing local VM images and VM disk files.
- Reporting local resource availability (CPU, RAM, Disk) to the Master.
- Executing VM lifecycle commands (Create, Start, Stop, Delete) requested by the Master.
- Direct interaction with local
- Dependencies: FastAPI (lightweight version of the current backend), libvirt, qemu-utils.
2. Database Changes
New: SlaveORM
Stores information about registered slave nodes.
id: UUID (Primary Key)name: String (Friendly name)hostname: String (IP or Domain to reach the slave)port: Integer (API port)api_key: String (Secure token for Master-Slave communication)status: String (Online, Offline, Maintenance)last_heartbeat: DateTimetotal_cpu: Integer (Total vCPUs)total_mem: Integer (Total RAM in MB)total_disk: Integer (Total Disk in GB)
Updated: VmORM
slave_id: UUID (Foreign Key toSlaveORM.id, nullable for legacy or "unassigned" state)host_id: (Already exists? No, need to ensure we track which slave it belongs to)
3. API Design
Master API (New/Modified Endpoints)
Slave Management
GET /slaves: List all registered slaves and their current status.POST /slaves: Register a new slave (manually or via a token).GET /slaves/{id}: Detailed info for a specific slave.DELETE /slaves/{id}: Unregister a slave.
VM Management (Modified)
POST /vms: Now includes an optionalslave_id. If omitted, the Master's Scheduler picks the best slave.- Other VM operations (
/vms/{id}/start, etc.) will now lookup theslave_idand proxy the request to the
corresponding Slave API.
Slave API (Internal)
The Slave will run a stripped-down version of the current backend:
POST /vms: Create a VM locally.GET /vms/{vm_id}: Get local VM status.POST /vms/{vm_id}/start: Start local VM.POST /vms/{vm_id}/stop: Stop local VM.DELETE /vms/{vm_id}: Delete local VM.GET /host/info: Return local resource usage (CPU, RAM, Disk).
4. Scheduling Strategy
When a user creates a VM without specifying a slave, the Master will use a Scheduler:
- Least Loaded: Pick the slave with the most available RAM/CPU.
- Round Robin: Cycle through online slaves.
- Sticky: Attempt to group VMs for a specific user/project (optional).
5. Security
- Master -> Slave: The Master will include an
X-Slave-Tokenin the header for all requests to a Slave. - Slave -> Master (Heartbeat): The Slave will periodically send a heartbeat to
POST /slaves/heartbeatwith its
status and current load, authenticated by itsapi_key.
6. Implementation TODO List
Phase 1: Master Preparation
- Create
SlaveORMmodel and run migrations. - Add
slave_idtoVmORM. - Implement
SlaveServiceon the Master for CRUD and heartbeat handling. - Implement
SlaveRouterfor management endpoints.
Phase 2: Slave Agent Development
- Extract current VM/Host logic into a new "Slave Mode" or a separate slimmed-down package.
- Implement simple API Key authentication for the Slave.
- Add a background task on the Slave to send periodic heartbeats to the Master.
Phase 3: Integration & Orchestration
- Update
VmServiceon the Master to become a "Proxy Service". - Implement the Scheduler logic (Basic "Least Loaded").
- Update VM creation to pull images from a central source or handle local image distribution.
Phase 4: Frontend Updates
- Add "Slaves" management page.
- Update "Provision VM" page to allow selecting a specific host (optional).
- Display which Slave a VM is running on in the VM list/detail view.
7. Additional Dependencies
httpx: For the Master to perform asynchronous HTTP requests to Slaves.apscheduler: For the Slave to handle heartbeat tasks (or simpleasyncioloop).
Reactions are currently unavailable