Add fused GatedDeltaNet decode Triton kernel #501
| Job | Run time |
|---|---|
| 12m 23s | |
| 35m 31s | |
| 33m 1s | |
| 35m 44s | |
| 29m 18s | |
| 9m 31s | |
| 9m 35s | |
| 10m 29s | |
| 13m 3s | |
| 11m 46s | |
| 10m 34s | |
| 11m 0s | |
| 11m 22s | |
| 10m 44s | |
| 11m 7s | |
| 10m 40s | |
| 10m 37s | |
| 10m 55s | |
| 10m 56s | |
| 11m 18s | |
| 10m 25s | |
| 5h 19m 59s |
| Job | Run time |
|---|---|
| 12m 23s | |
| 35m 31s | |
| 33m 1s | |
| 35m 44s | |
| 29m 18s | |
| 9m 31s | |
| 9m 35s | |
| 10m 29s | |
| 13m 3s | |
| 11m 46s | |
| 10m 34s | |
| 11m 0s | |
| 11m 22s | |
| 10m 44s | |
| 11m 7s | |
| 10m 40s | |
| 10m 37s | |
| 10m 55s | |
| 10m 56s | |
| 11m 18s | |
| 10m 25s | |
| 5h 19m 59s |