Skip to content

[Draft] Add multi-cluster setup scripts and manifests for local development of FederatedRayCluster#4735

Draft
seanlaii wants to merge 3 commits intoray-project:masterfrom
seanlaii:frc-setup
Draft

[Draft] Add multi-cluster setup scripts and manifests for local development of FederatedRayCluster#4735
seanlaii wants to merge 3 commits intoray-project:masterfrom
seanlaii:frc-setup

Conversation

@seanlaii
Copy link
Copy Markdown
Contributor

Why are these changes needed?

This PR adds a local development environment for the upcoming Federated RayCluster feature. It creates 3 interconnected kind clusters using Cilium ClusterMesh, providing the full bidirectional Pod-to-Pod IP connectivity that Ray requires for cross-cluster head-to-worker and worker-to-worker communication.

Ray's networking model demands more than just exposing the head's GCS port -- the head must connect back to workers (for task scheduling, placement groups, node draining), and workers must communicate directly with each other (for object transfer via ray.get(), actor calls). Cilium ClusterMesh provides this transparently across kind clusters, handling Ray's dynamic port requirements without per-port configuration.

What's included:

  • hack/bootstrap-federated-ray-lab.sh -- one-command setup: creates 3 kind clusters, installs Cilium, enables. ClusterMesh, connects all clusters into a full mesh
  • hack/smoke-test.sh -- verifies all 6 bidirectional cross-cluster Pod-to-Pod connectivity paths
  • hack/cleanup-federated-ray-lab.sh -- tears down everything

This environment can be used for developing and testing the FederatedRayCluster CRD, federation controller, and spec.workerOnly support described in the design proposal.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@seanlaii seanlaii changed the title Add federated RayCluster setup scripts and manifests for local development [Draft] Add federated RayCluster setup scripts and manifests for local development Apr 18, 2026
@seanlaii
Copy link
Copy Markdown
Contributor Author

Result from smoke-test.sh:

$ ./hack/smoke-test.sh

=== Deploying echo pods ===
pod/echo-primary created
pod/echo-member-a created
pod/echo-member-b created
=== Waiting for pods to be ready ===
pod/echo-primary condition met
pod/echo-member-a condition met
pod/echo-member-b condition met
=== Getting Pod IPs ===
  echo-primary  (frc-primary):  10.10.2.146
  echo-member-a (frc-member-a): 10.20.1.37
  echo-member-b (frc-member-b): 10.30.2.7

=== Testing cross-cluster Pod-to-Pod connectivity ===
  frc-primary -> frc-member-a (10.20.1.37:5678): OK (hello-from-member-a)
  frc-primary -> frc-member-b (10.30.2.7:5678): OK (hello-from-member-b)
  frc-member-a -> frc-primary (10.10.2.146:5678): OK (hello-from-primary)
  frc-member-a -> frc-member-b (10.30.2.7:5678): OK (hello-from-member-b)
  frc-member-b -> frc-primary (10.10.2.146:5678): OK (hello-from-primary)
  frc-member-b -> frc-member-a (10.20.1.37:5678): OK (hello-from-member-a)

=== Results: 6 passed, 0 failed ===

SMOKE TEST PASSED: All cross-cluster connectivity checks succeeded.

@seanlaii seanlaii changed the title [Draft] Add federated RayCluster setup scripts and manifests for local development [Draft] Add multi-cluster setup scripts and manifests for local development of FederatedRayCluster Apr 19, 2026
@seanlaii seanlaii force-pushed the frc-setup branch 2 times, most recently from 6fdd39e to 52140a2 Compare April 19, 2026 02:56
…pment

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>
@EagleLo
Copy link
Copy Markdown
Contributor

EagleLo commented Apr 19, 2026

Already confirmed the setup script hack/bootstrap-federated-ray-lab.sh is runnable and verified the results with the hack/smoke-test.sh .

Screenshot 2026-04-19 at 3 11 21 PM Screenshot 2026-04-19 at 3 12 16 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants