Refresh TeaCache when num_inference_steps=None#2240
Refresh TeaCache when num_inference_steps=None#2240alex-jw-brooks wants to merge 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Thanks for the PR. A quick question: if I understand it correctly, this PR is only a quick fix It force set the number of inference steps to 0: not None but falsy. This passes the check during cache refreshing, and also yields to pipeline-specific overrides. And a more complete fix is at your cache_refresh branch |
|
Hey @fhfuih! No worries 😆 but yes. My understanding of the flow is
For TeaCache, the refresh does not depend on the timesteps, and is just resetting the TeaCache state (i.e., the Since it's not being called currently, the state is stale from the last execute model call, so instead of creating a new one on the first time step, it gets the old one, so we fall through this check. So this fix is okay for a short-term fix for the behavior for TeaCache, but the other branch will fix it more correctly by passing the actual |
Purpose
Related to #2194
The proper fix for the above issue is to merge the sampling params to get the correct
num_inference_steps, but this PR adds a short-term workaround for teacache, which doesn't depend onnum_inference_steps. It also adds logging if the cache fails to reset for now while I am working on the more general fix.This is needed because the warmup initializes teacache, which replaces forward(), and can cause bad behaviors when running TTI on models that accept image inputs. E.g., for Flux2Klein
Not refreshing before entering the forward pass will blow up because the new modulated inputs don't have an image component, while the previous (stale) ones do.
This PR allows teacache to refresh in this case, and adds a log if we can't refresh the cache while the more correct fix is added.
@Gaohan123 @wtomin @fhfuih could you please take a look?