fix sp for HunyuanVideo-1.5 by cszhouY · Pull Request #656 · xdit-project/xDiT

cszhouY · 2026-03-03T08:35:20Z

What?

Fix the issue #655 , primarily by modifying the SP in transformers_hunyuan_video15.py.

How?

In the original implementation, both encoder_hidden_states and encoder_attention_mask were processed by _chunk_and_pad_sequence. However, USP does not support the argument attention_mask, which caused the padded (masked) portions in encoder_attention_mask to incorrectly participate in the attention computation.

Referencing SP in transformers_hunyuan_video.py, I fixed this issue: first truncating the padded portions from encoder_hidden_states, and using USP with joint_strategy when the sequence length of encoder_hidden_states is not divisible by ulysses_degree.

Test

The test code is the same as the one in the #655 .

width	height	frames	steps	seed
1280	720	81	50	42

prompt: A cat holding a paper with words "Hello, world!"

Diffusers

hunyuan_video_15_base.mp4

xDiT

hunyuan_video_15_t2v_result_ulysses4_ring1_tc_False_720x1280.mp4

gemini-code-assist · 2026-03-03T08:35:44Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue in the sequence parallelism implementation for the HunyuanVideo-1.5 model, specifically concerning how attention masks and encoder hidden states are managed during distributed processing. The changes ensure that padded tokens do not interfere with attention calculations and streamline the sequence parallel operations by leveraging direct PyTorch and distributed group functionalities, ultimately fixing issue #655.

Highlights

Corrected Sequence Parallelism (SP) for Attention Masks: Addressed a bug where USP (Ulysses Sequence Parallelism) incorrectly processed padded portions of encoder_attention_mask, leading to erroneous attention computations.
Refined Encoder Hidden States Handling: Modified the processing of encoder_hidden_states to truncate padded regions based on the attention mask before passing them to USP, ensuring only relevant tokens participate in attention.
Conditional USP Strategy: Implemented a conditional strategy for USP to use joint_strategy="rear" when the sequence length of encoder_hidden_states is not divisible by the sequence parallel world size, improving flexibility and correctness for text embeddings.
Simplified SP Utility Methods: Removed custom _chunk_and_pad_sequence and _gather_and_unpad methods, opting for direct torch.chunk and get_sp_group().all_gather operations for sequence parallelism.

Changelog

xfuser/model_executor/models/transformers/transformer_hunyuan_video15.py
- Imported additional distributed utility functions, including get_runtime_state and get_cfg_group.
- Removed the _chunk_and_pad_sequence and _gather_and_unpad helper methods from the xFuserHunyuanVideo15Transformer class.
- Modified the xFuserHunyuanVideo15AttnProcessor.__call__ method to dynamically determine num_encoder_hidden_states_tokens and num_query_tokens.
- Introduced conditional logic within xFuserHunyuanVideo15AttnProcessor.__call__ to handle sequence parallelism for attention, splitting query, key, and value tensors and using USP with joint_strategy="rear" when split_text_embed_in_sp is false.
- Updated the xFuserHunyuanVideo15Transformer.forward method to directly chunk hidden_states for sequence parallelism.
- Filtered encoder_hidden_states based on encoder_attention_mask to remove padded tokens before further processing.
- Added logic to set get_runtime_state().split_text_embed_in_sp based on whether encoder_hidden_states length is divisible by sp_world_size, and then conditionally chunk encoder_hidden_states.
- Directly chunked image_rotary_emb components (freqs_cos, freqs_sin) for sequence parallelism.
- Replaced the encoder_attention_mask argument with None when calling transformer blocks.
- Replaced the call to _gather_and_unpad with get_sp_group().all_gather for the final hidden states.

Activity

The author identified and addressed issue Outputs of the Hunyuan Video 1.5 in xDiT is different from Diffusers #655, which involved incorrect handling of attention masks in sequence parallelism for HunyuanVideo-1.5.
The author provided detailed explanations of the problem and the implemented solution in the pull request description.
Test results, including visual comparisons (Diffusers vs. xDiT) with specific parameters (width, height, frames, steps, seed), were provided to demonstrate the fix's effectiveness.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to fix an issue with sequence parallelism (SP) for HunyuanVideo-1.5 by correctly modifying how encoder_hidden_states are handled, specifically by removing padding and using USP with a joint_strategy when the sequence length is not divisible by the SP world size. However, a critical security concern has been identified: the current batch truncation implementation is flawed as it relies on the mask of the first item in the batch. This can lead to incorrect processing or data loss for other items in the batch with different prompt lengths, particularly in multi-tenant environments where requests are batched together. Furthermore, the implementation introduces a critical regression by removing necessary padding and unpadding logic for hidden_states and image_rotary_emb, which will cause errors during the all_gather operation if the sequence length is not divisible by the SP world size. Minor import and style issues also require attention.

gemini-code-assist · 2026-03-03T08:38:42Z