-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Your project is of interest to us.
Before trying to set it up on our side, I would like to ensure that it will be able to sustain our use case.
We are developping a performance portable code, thereby targeting many different hardwares.
I will list our constraints below. Thanks for looking at them!
No service allowed on clusters
The clusters we have access to do not allow setting up a service on them. Therefore, we need to be able to submit Slurm commands from a server that can connect to these clusters.
I would therefore plan to have a server in our lab that listens to the webhook, and submits commands to clusters.
Would something like that be possible?
Many clusters at the same time
We have pipelines that run many jobs for a variety of backends (Cuda, HIP or CPU) with varying family (hopper or blackwell for instance). We therefore have to resort on multiple clusters to get all these jobs running.
Is that possible?
Summary
We would use a server in our lab to listen to the webhooks and submit to many different clusters.