Description
When running the opendronemap/odm:gpu image in a Kubernetes environment (specifically Google Kubernetes Engine) using standard GPU tolerations, the ODM pipeline fails to utilize the GPU and crashes during the openmvs stage.
The pipeline reports [INFO] No nvidia-smi detected, passes --cuda-device -2 to OpenMVS, and subsequently crashes with error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory:
[2026-03-10, 09:29:09 UTC] [INFO] Estimating depthmaps
[2026-03-10, 09:29:09 UTC] [INFO] No nvidia-smi detected
[2026-03-10, 09:29:09 UTC] [INFO] running "/code/SuperBuild/install/bin/OpenMVS/DensifyPointCloud" [...] -v 0 --cuda-device -2
[2026-03-10, 09:29:09 UTC] /code/SuperBuild/install/bin/OpenMVS/DensifyPointCloud: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory
[2026-03-10, 09:29:09 UTC] Child returned 127
(Log truncated for brevity)
To Reproduce
- Deploy
opendronemap/odm:gpu in a Kubernetes cluster requesting nvidia.com/gpu: 1.
- Run standard ODM pipeline arguments (e.g.,
--dsm --dtm --pc-quality high).
- Observe the logs during the
openmvs stage.
- The pipeline fails with a
Child returned 127 SubprocessException.
Expected Behavior
The container should detect the mounted GPU via nvidia-smi, correctly load the NVIDIA shared libraries, and execute the OpenMVS stage using --cuda-device -1 (or the appropriate GPU ID) without crashing.
Root Cause & Workaround
Unlike docker run --gpus all (which actively alters the container's environment variables at runtime to inject NVIDIA paths), Kubernetes device plugins simply mount the hardware files into /usr/local/nvidia and rely on the image's ENV instructions to make them discoverable.
Currently, the gpu.Dockerfile causes issues in Kubernetes for two reasons:
- The
$PATH Issue: /usr/local/nvidia/bin is missing from the system $PATH. When run.py uses subprocess.run to call nvidia-smi directly, it fails, causing the pipeline to assume no GPU exists.
- The
$LD_LIBRARY_PATH Issue: In gpu.Dockerfile, the path is set via ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/code/SuperBuild/install/lib". While this correctly appends the ODM paths to the CUDA base image paths at build time, it prevents the Kubernetes runtime from dynamically resolving libcuda.so.1 or libnvidia-ml.so, which the K8s device plugin mounts at /usr/local/nvidia/lib64.
I successfully worked around this by manually overriding the environment variables in the Kubernetes Pod spec to explicitly include the NVIDIA mount paths:
env:
- name: NVIDIA_DRIVER_CAPABILITIES
value: "compute,utility"
- name: LD_LIBRARY_PATH
value: "/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/code/SuperBuild/install/lib"
- name: PATH
value: "/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Proposed Solution
Could the specific NVIDIA runtime paths be explicitly prepended to the ENV definitions inside gpu.Dockerfile?
For example:
ENV PATH="/usr/local/nvidia/bin:$PATH" \
LD_LIBRARY_PATH="/usr/local/nvidia/lib64:/usr/local/nvidia/lib:$LD_LIBRARY_PATH:/code/SuperBuild/install/lib"
This would make the image immediately compatible out-of-the-box for Kubernetes/Cloud deployments without users having to manually map environment variables.
A Note on Docker Tags
I noticed that the opendronemap/odm:gpu tag acts effectively as a "latest" tag and is automatically updated with commits to the master branch. This recently caused our automated pipelines to break unexpectedly (likely related to commit 44e3ff6 which appears to have introduced changes to the underlying CUDA base image, altering the default system paths that were previously working). Would it be possible to introduce versioned GPU tags (e.g., odm:3.6.0-gpu) on Docker Hub so that we can pin to stable releases in production environments?
Thank you to all the contributors for the incredible work on this project!
Description
When running the
opendronemap/odm:gpuimage in a Kubernetes environment (specifically Google Kubernetes Engine) using standard GPU tolerations, the ODM pipeline fails to utilize the GPU and crashes during theopenmvsstage.The pipeline reports
[INFO] No nvidia-smi detected, passes--cuda-device -2to OpenMVS, and subsequently crashes witherror while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory:(Log truncated for brevity)
To Reproduce
opendronemap/odm:gpuin a Kubernetes cluster requestingnvidia.com/gpu: 1.--dsm --dtm --pc-quality high).openmvsstage.Child returned 127SubprocessException.Expected Behavior
The container should detect the mounted GPU via
nvidia-smi, correctly load the NVIDIA shared libraries, and execute the OpenMVS stage using--cuda-device -1(or the appropriate GPU ID) without crashing.Root Cause & Workaround
Unlike
docker run --gpus all(which actively alters the container's environment variables at runtime to inject NVIDIA paths), Kubernetes device plugins simply mount the hardware files into/usr/local/nvidiaand rely on the image'sENVinstructions to make them discoverable.Currently, the
gpu.Dockerfilecauses issues in Kubernetes for two reasons:$PATHIssue:/usr/local/nvidia/binis missing from the system$PATH. Whenrun.pyusessubprocess.runto callnvidia-smidirectly, it fails, causing the pipeline to assume no GPU exists.$LD_LIBRARY_PATHIssue: Ingpu.Dockerfile, the path is set viaENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/code/SuperBuild/install/lib". While this correctly appends the ODM paths to the CUDA base image paths at build time, it prevents the Kubernetes runtime from dynamically resolvinglibcuda.so.1orlibnvidia-ml.so, which the K8s device plugin mounts at/usr/local/nvidia/lib64.I successfully worked around this by manually overriding the environment variables in the Kubernetes Pod spec to explicitly include the NVIDIA mount paths:
Proposed Solution
Could the specific NVIDIA runtime paths be explicitly prepended to the
ENVdefinitions insidegpu.Dockerfile?For example:
This would make the image immediately compatible out-of-the-box for Kubernetes/Cloud deployments without users having to manually map environment variables.
A Note on Docker Tags
I noticed that the
opendronemap/odm:gputag acts effectively as a "latest" tag and is automatically updated with commits to the master branch. This recently caused our automated pipelines to break unexpectedly (likely related to commit 44e3ff6 which appears to have introduced changes to the underlying CUDA base image, altering the default system paths that were previously working). Would it be possible to introduce versioned GPU tags (e.g.,odm:3.6.0-gpu) on Docker Hub so that we can pin to stable releases in production environments?Thank you to all the contributors for the incredible work on this project!