You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow the [nvkind prerequisites and setup guide](https://github.com/NVIDIA/nvkind#prerequisites) to install the NVIDIA driver, nvidia-container-toolkit, and nvkind on your host. Once `nvkind` is on `$PATH`, `make up` handles the rest.
76
75
77
-
**What `make up` does**: `kind-config.yaml` labels workers as `inference` and `train`, with the train node getting `extraMounts` that signal GPU presence to nvkind. `make cluster-up`runs nvkind (installs toolkit inside the node, configures containerd). `make infra-up` creates the `nvidia` RuntimeClass, labels the GPU node, and deploys the device plugin.
76
+
**What `make up` does**: `kind-config.yaml` labels workers as `inference` and `train`, with the train node getting `extraMounts` that signal GPU presence to nvkind. The cluster creation step runs nvkind (installs toolkit inside the node, configures containerd). The infra step creates the `nvidia` RuntimeClass, labels the GPU node, and deploys the device plugin.
78
77
79
78
**Caveats**:
80
79
81
80
-`PatchProcDriverNvidia` may fail on non-MIG single-GPU hosts — non-critical, the Makefile tolerates it.
82
81
- nvkind restarts containerd on the GPU node, briefly disrupting colocated pods.
Node labels and taints use a configurable domain prefix (default `fair-dev.hotosm.org`).
89
+
Override via environment variable:
90
+
91
+
```bash
92
+
export FAIR_LABEL_DOMAIN=fair-dev.hotosm.org # dev
93
+
make up
94
+
```
95
+
96
+
Consumed in three places:
97
+
98
+
-**`kind-config.yaml`** — node labels (`${FAIR_LABEL_DOMAIN}/role`) and taints (`${FAIR_LABEL_DOMAIN}/workload`), resolved via `envsubst` at cluster creation
99
+
-**`stacks/k8s.yaml`** — pod `node_selectors` and `tolerations`, resolved via `envsubst` at stack registration
100
+
-**`fair/zenml/config.py`** — reads `FAIR_LABEL_DOMAIN` at runtime (default `fair.hotosm.org`) for pipeline pod scheduling
@uv run --with minio python -c "from pathlib import Path; from minio import Minio; c = Minio('localhost:9000', 'minioadmin', 'minioadmin', secure=False); root = Path('../../data/sample'); files = [f for f in root.rglob('*') if f.is_file()]; print(f'Uploading {len(files)} files to fair-data/sample/'); [c.fput_object('fair-data', f'sample/{f.relative_to(root)}', str(f)) for f in files]; print('Done')"
0 commit comments