Skip to content

feat(compute): add network connectivity enhancement#667

Draft
scotwells wants to merge 8 commits intomainfrom
feat/compute-network-connectivity
Draft

feat(compute): add network connectivity enhancement#667
scotwells wants to merge 8 commits intomainfrom
feat/compute-network-connectivity

Conversation

@scotwells
Copy link
Contributor

Summary

  • Defines a single integration point in the compute layer for connecting workloads to the Galactic VPC (gVPC) network fabric
  • Proposes a workload-agnostic connectivity model where product offerings (functions, databases, containers) consume network attachment rather than building their own
  • Covers proxyless wake-on-traffic via eBPF/NFQUEUE on BlueField DPU ARM cores for scale-from-zero workloads
  • Includes inter-workload connectivity and demand-driven regional placement

Test plan

  • Review architecture diagrams (context, container, traffic flows, inter-workload) for accuracy
  • Validate technical assumptions around BlueField DPU capabilities (XDP native mode, eSwitch steering, SRv6 via DPL)
  • Validate NFQUEUE/AF_XDP packet hold approach for sub-20ms cold start
  • Discuss open questions: minimal gVPC milestone, VF passthrough per runtime, NFQUEUE vs AF_XDP, SRv6 timeline
  • Review with Shelby and Peter on DPU integration model
  • Review with Julia (Unikraft) on SR-IOV/VF passthrough feasibility

🤖 Generated with Claude Code

scotwells and others added 8 commits March 19, 2026 16:14
Define a single integration point in the compute layer for connecting
workloads to the Galactic VPC network fabric. This enhancement proposes
a workload-agnostic connectivity model where all workload types attach
to gVPC at the network interface layer through a shared platform
capability, rather than each product offering building its own
connectivity.

Includes C4 architecture diagrams covering system context, container
architecture, steady-state traffic flow, cold-start (scale-from-zero)
traffic flow, and inter-workload connectivity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace C4 container diagrams with sequence diagrams for the two traffic
flow views (steady-state and cold-start). Sequence diagrams better
represent the temporal ordering, parallel operations, and
request/response patterns in these flows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…raints

Introduce NetworkInterfaceClass abstraction that lets consumers choose
between high-density (OVS-offloaded virtio/TAP) and near-line-rate
(SR-IOV VF passthrough) network attachment. Reframes the architecture
around OVS offload as the primary data path for high-density workloads,
with SR-IOV reserved for latency-sensitive use cases.

Key additions:
- NetworkInterfaceClass design with standard/performance tiers
- SR-IOV scalability constraints (252 max VFs, 64-128 practical)
- OVS offload as primary path scaling to thousands per node
- Updated risks, open questions, and UNRESOLVED blocks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove titles from all diagrams (redundant with document headings)
- Collapse Datum Edge and gVPC Fabric to external systems in container
  diagram, focusing on Worker Node internals
- Remove VMM as separate box, wake daemon connects directly to instance
- Simplify inter-workload diagram to workload-to-fabric-to-workload
- Remove BlueField DPU from inter-workload (implementation detail)
- Remove WireGuard from edge node (fabric implementation detail)
- Rename "Workload" to "Instance" in container diagram

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ikraft API context

- Remove "Go" technology label from wake daemon (implementation TBD)
- Rename "Workload" to "Instance" in cold-start sequence diagram
- Remove Firecracker/VMM as intermediary in cold-start flow
- Add Unikraft's proposed C library API for instance wake-up to the
  UNRESOLVED block, with open questions on integration approach

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace eBPF/XDP/NFQUEUE/AF_XDP wake-on-traffic mechanism with OVS
flow miss handling. Since OVS is already the primary data path for the
standard NetworkInterfaceClass, using its existing flow miss mechanism
for scale-from-zero is simpler and avoids introducing eBPF complexity.

When a packet arrives for a sleeping instance, there is no OVS flow
rule. The packet enters the OVS slow path, the wake daemon triggers
the instance boot, and once ready, OVS installs and offloads a flow
rule to the DPU eSwitch hardware.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Audit pass to ensure all sections reflect the shift from eBPF to OVS
flow miss for wake-on-traffic:

- Architecture overview: DPU + OVS as node layer, remove eBPF reference
- Steady-state flow: describe both standard (OVS/TAP) and performance
  (VF) paths instead of assuming VF-only
- Inter-workload: neutral about attachment type (OVS/TAP or VF)
- Julia UNRESOLVED: fix stale eBPF reference to OVS flow miss
- Steady-state sequence diagram: show class-neutral delivery
- Remove orphaned link references (af-xdp, ebpf, nfqueue, xdp)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up leftover mentions of eBPF, XDP, NFQUEUE, and AF_XDP from
the document body. The only remaining eBPF reference is in the Koyeb
blog post title in the references section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant