This repository was archived by the owner on Mar 21, 2026. It is now read-only.
Replies: 1 comment
-
|
Hello @Rictus, Thanks for starting this discussion! When dealing with AI/LLM integrations, Vector DBs, or agent frameworks, quirks like this can usually be traced back to a few specific moving parts:
If you are still blocked, providing a minimal reproducible snippet or logging the raw request/response payload (scrubbed of secrets) usually helps pinpoint the exact failure layer much faster. Hope this helps point you in the right direction. Let me know if you make any progress! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In certain cloud providers, sharing a GPU across multiple containers is not an option. As a result, running several TGI instances on the same GPU for different models is not feasible. One potential solution would be to enable multiple models to be served by a single TGI container. What would be the implications of implementing such changes to the TGI code base?
Beta Was this translation helpful? Give feedback.
All reactions