Add z-image support with cfg-parallel#666
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for the original Z-Image model, enhancing the system's capability to run and evaluate different image generation models. A significant improvement is the implementation of parallel processing for Classifier-Free Guidance within the Z-Image transformer, which optimizes performance by distributing computational load. These changes enable more efficient and flexible image generation workflows, particularly when using guidance scales greater than one. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for the Z-Image model and enables CFG-parallel for it, which is a great enhancement for performance. The implementation of CFG-parallel by splitting the batch across ranks in the transformer and gathering the results is well done. I've found one potential issue in the prompt handling for the new model which could lead to incorrect behavior when a single string prompt is used. My specific comment provides a fix for this, suggesting a more readable if/else structure as per repository guidelines.
| return pipe | ||
|
|
||
| def _run_pipe(self, input_args: dict) -> DiffusionOutput: | ||
| prompt = list(input_args["prompt"]) |
There was a problem hiding this comment.
The current implementation of converting prompt to a list can lead to incorrect behavior when a single prompt is provided as a string. If input_args["prompt"] is a string like "A cute cat", list(input_args["prompt"]) will produce a list of characters ['A', ' ', 'c', 'u', 't', 'e', ' ', 'c', 'a', 't'], which is not the intended behavior for the diffusion pipeline.
To handle both single string prompts and lists/tuples of prompts correctly, it's clearer to use an explicit if/else block rather than a conditional expression, as per the guideline to avoid complex one-liners for readability.
if isinstance(input_args["prompt"], str):
prompt = [input_args["prompt"]]
else:
prompt = list(input_args["prompt"])References
- Avoid using a conditional expression to select and call a function in a single line if it results in a long or complex statement. A more verbose if/else block can be clearer and more readable.
What
Enables Z-image as an xDiT runner model and modifiers the transformer to allow for cfg-parallel
Why
Having the original Z-image model makes comparisons with the distilled Z-image-turbo easier.
How
Registers
Tongyi-MAI/Z-Imageas a new model inz_image.py.In the original diffusers pipeline, cfg is active when guidance_scale > 1. It doubles the batch by stacking
conditional and unconditional inputs before passing them into the transformer.
xFuserZImageTransformer2DWrapper.forward()makes it parallelizable by splitting the work across CFG ranks.Tests
Running the above command produces identical outputs when compared to running without
--use_cfg_paralleland with--ulysses_degree 1 --use_cfg_paralleland improves performance over both by several seconds.