Load multiple models in a single TGI container #3183

Rictus · 2025-04-18T09:33:29Z

Rictus
Apr 18, 2025

In certain cloud providers, sharing a GPU across multiple containers is not an option. As a result, running several TGI instances on the same GPU for different models is not feasible. One potential solution would be to enable multiple models to be served by a single TGI container. What would be the implications of implementing such changes to the TGI code base?

aniruddhaadak80 · 2026-03-14T05:51:12Z

aniruddhaadak80
Mar 14, 2026

Hello @Rictus,

Thanks for starting this discussion! When dealing with AI/LLM integrations, Vector DBs, or agent frameworks, quirks like this can usually be traced back to a few specific moving parts:

Environment & Dependency Versions: The AI tooling ecosystem evolves incredibly rapidly. Double-check your local environment (Node/Python versions) to ensure packages like langchain, transformers, or openai are on the latest stable releases. Breaking API changes upstream happen often.
Context Window & Embeddings: If you are seeing strange truncation, blank responses, or timeouts, make sure the payload you are sending fits well within the specific model???s maximum context window. Also verify your vector chunk sizes match the dimensionality of your chosen embedding model.
Rate Limits: If you are hitting external APIs, occasionally verify that your API key is correctly scoped and that you are not hitting concurrency or token rate limits which might silently fail or hang the process.

If you are still blocked, providing a minimal reproducible snippet or logging the raw request/response payload (scrubbed of secrets) usually helps pinpoint the exact failure layer much faster.

Hope this helps point you in the right direction. Let me know if you make any progress!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load multiple models in a single TGI container #3183

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Load multiple models in a single TGI container #3183

Uh oh!

Rictus Apr 18, 2025

Replies: 1 comment

Uh oh!

aniruddhaadak80 Mar 14, 2026

Rictus
Apr 18, 2025

aniruddhaadak80
Mar 14, 2026