Erbium is Project Neura's internal compute platform. It is designed to suit mid-scale organizational needs and 100% open-source to avoid concerns about backdoors. We do not aim to provide a user system as it may get too complex and needs greatly vary across organizations. Our main goal is to build a general solution for job scheduling and orchestration. We bring these together using C++ integrated with Python running in Docker containers, and services exposed via APIs.
Currently, we only considered hosts with Nvidia GPUs, as they are the most developer-friendly and widely used. You can simply fork this repository and replace the driver-related code with your own.
- SSH Tunneling
- JupyterLab
- Job Scheduling
- File History Backup
Some facts:
- SSH tunnel defaults to ~, not "/workspace"
- JupyterLab can only explore files in "/workspace", but its terminal can access anything
- Only the output folder "/workspace/output" is backed up every 12 hours
- Jobs that reach their requested duration will be killed
The username is always "access". You need to schedule a job to set the SSH password.
For example, to access the main node, visit https://main-erbium.projectneura.org and you will see the web SSH interface.
Adding "node-" such that https://node-main-erbium.projectneura.org directs you to the job scheduling page.
Adding "jupyter-" such that https://node-main-erbium.projectneura.org directs you to the JupyterLab. Note that whenever
a new job initializes, you will need to log into the web SSH interface and use jupyter server list to get the token.
To use other nodes, simply replace "main" with the name of the node.
git clone https://github.com/ProjectNeura/ErbiumYou need to save the Cloudflare Tunnel token locally as "cloudflared_tunnel_token.txt" in the root directory of the project.
Then, run the following command to start the tunnel:
python -m erbium docker initpython -m erbium docker create -n SERVICE_NAME -p SSH_PASSWORD INPUT_DIR OUTPUT_DIR ./docker-compose.yamlpython -m erbium docker run ./docker-compose.yaml SERVICE_NAME