Self Checks
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
OS: Arch Linux
CPU: Intel i9-9900k
GPU: NVIDIA RTX 4090
RAM: 16Gb
Python: 3.12
Steps to Reproduce
- Install dependencies for cu129
- Download fishaudio/s2-pro model
- Run
python fish_speech/models/text2semantic/inference.py ... with or without --compile flag
- Get OOM
✔️ Expected Behavior
The model is loaded into VRAM and runs completely on GPU
❌ Actual Behavior
After running the script The model tries to load into RAM instead of VRAM, however the RAM is to small to handle the model so this results in OOM crash. I cannot understand how to make model to use CUDA and be loaded into VRAM. The flag --device cuda changes nothing. If I run torch.cuda.is_available() in python repl, then I get True. nvidia-smi screenshot:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.58.03 Driver Version: 595.58.03 CUDA Version: 13.2 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 On | Off |
| 0% 36C P8 9W / 450W | 17MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Self Checks
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
OS: Arch Linux
CPU: Intel i9-9900k
GPU: NVIDIA RTX 4090
RAM: 16Gb
Python: 3.12
Steps to Reproduce
python fish_speech/models/text2semantic/inference.py ...with or without--compileflag✔️ Expected Behavior
The model is loaded into VRAM and runs completely on GPU
❌ Actual Behavior
After running the script The model tries to load into RAM instead of VRAM, however the RAM is to small to handle the model so this results in OOM crash. I cannot understand how to make model to use CUDA and be loaded into VRAM. The flag
--device cudachanges nothing. If I runtorch.cuda.is_available()in python repl, then I getTrue.nvidia-smiscreenshot: