Cannot run on RTX 4090 with CUDA 13.2

### Self Checks

- [x] This template is only for bug reports. For questions, please visit [Discussions](https://github.com/fishaudio/fish-speech/discussions).
- [x] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. [English](https://speech.fish.audio/) [中文](https://speech.fish.audio/zh/) [日本語](https://speech.fish.audio/ja/) [Portuguese (Brazil)](https://speech.fish.audio/pt/)
- [x] I have searched for existing issues, including closed ones. [Search issues](https://github.com/fishaudio/fish-speech/issues)
- [x] I confirm that I am using English to submit this report (我已阅读并同意 [Language Policy](https://github.com/fishaudio/fish-speech/issues/515)).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
- [x] Please do not modify this template and fill in all required fields.

### Cloud or Self Hosted

Self Hosted (Source)

### Environment Details

OS: Arch Linux
CPU: Intel i9-9900k
GPU: NVIDIA RTX 4090
RAM: 16Gb
Python: 3.12

### Steps to Reproduce

1. Install dependencies for cu129
2. Download fishaudio/s2-pro model
3. Run `python fish_speech/models/text2semantic/inference.py ...` with or without `--compile` flag
4. Get OOM

### ✔️ Expected Behavior

The model is loaded into VRAM and runs completely on GPU

### ❌ Actual Behavior

After running the script The model tries to load into RAM instead of VRAM, however the RAM is to small to handle the model so this results in OOM crash. I cannot understand how to make model to use CUDA and be loaded into VRAM. The flag `--device cuda` changes nothing. If I run `torch.cuda.is_available()` in python repl, then I get `True`. `nvidia-smi` screenshot:

```
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.58.03              Driver Version: 595.58.03      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   36C    P8              9W /  450W |      17MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run on RTX 4090 with CUDA 13.2 #1258

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot run on RTX 4090 with CUDA 13.2 #1258

Description

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions