Skip to content

Distribox Slave #80

@lg-epitech

Description

@lg-epitech

This is a massive feature but it would be incredible to implement.

The idea here is a standard master/slave system. Where you would be able to deploy a distribox slave instance on another machine, reference it in your master node and be able to deploy VMs on the slave.

Distribox Master/Slave Architecture Specification

This document outlines the design and implementation plan for transitioning Distribox from a single-node setup to a
distributed Master/Slave architecture.

1. Architecture Overview

The system will be split into two main components:

A. Master Node (Orchestrator)

  • Role: Central point of truth and user interaction.
  • Responsibilities:
    • User authentication and authorization (RBAC).
    • Storing the global state of all Slaves and VMs.
    • Scheduling: Deciding which Slave should host a new VM.
    • Proxying commands to Slaves.
    • Aggregating resource usage and telemetry.
    • Exposing the API for the Frontend.
  • Dependencies: PostgreSQL (via SQLModel/SQLAlchemy), FastAPI.

B. Slave Node (Agent)

  • Role: Local resource provider and VM executor.
  • Responsibilities:
    • Direct interaction with local libvirt and qemu-img.
    • Managing local VM images and VM disk files.
    • Reporting local resource availability (CPU, RAM, Disk) to the Master.
    • Executing VM lifecycle commands (Create, Start, Stop, Delete) requested by the Master.
  • Dependencies: FastAPI (lightweight version of the current backend), libvirt, qemu-utils.

2. Database Changes

New: SlaveORM

Stores information about registered slave nodes.

  • id: UUID (Primary Key)
  • name: String (Friendly name)
  • hostname: String (IP or Domain to reach the slave)
  • port: Integer (API port)
  • api_key: String (Secure token for Master-Slave communication)
  • status: String (Online, Offline, Maintenance)
  • last_heartbeat: DateTime
  • total_cpu: Integer (Total vCPUs)
  • total_mem: Integer (Total RAM in MB)
  • total_disk: Integer (Total Disk in GB)

Updated: VmORM

  • slave_id: UUID (Foreign Key to SlaveORM.id, nullable for legacy or "unassigned" state)
  • host_id: (Already exists? No, need to ensure we track which slave it belongs to)

3. API Design

Master API (New/Modified Endpoints)

Slave Management

  • GET /slaves: List all registered slaves and their current status.
  • POST /slaves: Register a new slave (manually or via a token).
  • GET /slaves/{id}: Detailed info for a specific slave.
  • DELETE /slaves/{id}: Unregister a slave.

VM Management (Modified)

  • POST /vms: Now includes an optional slave_id. If omitted, the Master's Scheduler picks the best slave.
  • Other VM operations (/vms/{id}/start, etc.) will now lookup the slave_id and proxy the request to the
    corresponding Slave API.

Slave API (Internal)

The Slave will run a stripped-down version of the current backend:

  • POST /vms: Create a VM locally.
  • GET /vms/{vm_id}: Get local VM status.
  • POST /vms/{vm_id}/start: Start local VM.
  • POST /vms/{vm_id}/stop: Stop local VM.
  • DELETE /vms/{vm_id}: Delete local VM.
  • GET /host/info: Return local resource usage (CPU, RAM, Disk).

4. Scheduling Strategy

When a user creates a VM without specifying a slave, the Master will use a Scheduler:

  1. Least Loaded: Pick the slave with the most available RAM/CPU.
  2. Round Robin: Cycle through online slaves.
  3. Sticky: Attempt to group VMs for a specific user/project (optional).

5. Security

  • Master -> Slave: The Master will include an X-Slave-Token in the header for all requests to a Slave.
  • Slave -> Master (Heartbeat): The Slave will periodically send a heartbeat to POST /slaves/heartbeat with its
    status and current load, authenticated by its api_key.

6. Implementation TODO List

Phase 1: Master Preparation

  • Create SlaveORM model and run migrations.
  • Add slave_id to VmORM.
  • Implement SlaveService on the Master for CRUD and heartbeat handling.
  • Implement SlaveRouter for management endpoints.

Phase 2: Slave Agent Development

  • Extract current VM/Host logic into a new "Slave Mode" or a separate slimmed-down package.
  • Implement simple API Key authentication for the Slave.
  • Add a background task on the Slave to send periodic heartbeats to the Master.

Phase 3: Integration & Orchestration

  • Update VmService on the Master to become a "Proxy Service".
  • Implement the Scheduler logic (Basic "Least Loaded").
  • Update VM creation to pull images from a central source or handle local image distribution.

Phase 4: Frontend Updates

  • Add "Slaves" management page.
  • Update "Provision VM" page to allow selecting a specific host (optional).
  • Display which Slave a VM is running on in the VM list/detail view.

7. Additional Dependencies

  • httpx: For the Master to perform asynchronous HTTP requests to Slaves.
  • apscheduler: For the Slave to handle heartbeat tasks (or simple asyncio loop).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions