ConsID-Gen

View-Consistent and Identity-Preserving Image-to-Video Generation

Mingyang Wu¹ Ashirbad Mishra² Soumik Dey² Shuo Xing¹ Naveen Ravipati²
Hansi Wu² Binbin Li² Zhengzhong Tu¹^†

¹Texas A&M University ²eBay Inc.
^†Corresponding Author

Accepted by CVPR 2026

Updates

2026.03: Our code and dataset are under internal review.
2026.02: Our paper is accepted by CVPR 2026.

Open-Source Plan

We will release resources in stages after internal review. Please stay tuned.

ConsID-Gen inference/training code
ConsIDVid dataset: https://huggingface.co/datasets/mingyang-wu/ConsIDVid
Model checkpoints

Code: ConsID-Gen + Wan2.1 (Inference & Training)

Setup Environment

conda create -n considgen python=3.10
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

Model Download

Use this section to download the model weights required by run_inference_considgen.py.

# 1) Install Hugging Face CLI
pip install "huggingface_hub[cli]"

# 2) Download Wan2.1-Fun-1.3B-InP
mkdir -p ./models/PAI/Wan2.1-Fun-1.3B-InP
hf download "alibaba-pai/Wan2.1-Fun-1.3B-InP" --local-dir "./models/PAI/Wan2.1-Fun-1.3B-InP"

# 3) Download VGGT-1B
mkdir -p ./models/VGGT-1B
hf download "facebook/VGGT-1B" --local-dir "./models/VGGT-1B"

# 4) Download ConsID-Gen
hf download "mingyang-wu/ConsID-Gen" --local-dir "./models/ConsID-Gen/checkpoints"

Google Drive mirror for ConsID-Gen checkpoints.

Inference Example

Run single-image conditioned generation with a finetuned checkpoint:

python run_inference_considgen.py \
  --input_image_path /path/to/input_image.jpg \
  --image_dir /path/to/multi_view_images_dir \
  --prompt "A product-style close-up video with stable lighting and clean background." \
  --output_dir ./tmp \
  --checkpoint_path models/train/ConsID-Gen/model.safetensors

Training

Prepare the training dataset.

The training metadata should be a JSON list. Each sample needs:

video: path to the training video
prompt: text description for the video
image_list: list of multi-view reference image paths

Example metadata structure:

[
  {
    "dataset": "example_dataset_name",
    "video": "/path/to/video.mp4",
    "prompt": "A short text description of the target video.",
    "image_list": [
      "/path/to/view_1.jpg",
      "/path/to/view_2.jpg"
    ]
  }
]

CUDA_VISIBLE_DEVICES=3 python run_train_considgen.py \
  --dataset_metadata_path ./example_metadata.json \
  --model_paths '["models/PAI/Wan2.1-Fun-1.3B-InP/diffusion_pytorch_model.safetensors","models/PAI/Wan2.1-Fun-1.3B-InP/models_t5_umt5-xxl-enc-bf16.pth","models/PAI/Wan2.1-Fun-1.3B-InP/Wan2.1_VAE.pth","models/PAI/Wan2.1-Fun-1.3B-InP/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth"]' \
  --vggt_model_path models/VGGT-1B \
  --tokenizer_path models/PAI/Wan2.1-Fun-1.3B-InP/google/umt5-xxl \
  --trainable_models dit,considgen_adapter \
  --output_path models/ConsID-Gen/example \
  --num_epochs 1 \
  --dataset_num_workers 0 \
  --wandb_mode disabled

Citation

@misc{wu2026considgenviewconsistentidentitypreservingimagetovideo,
  title={ConsID-Gen: View-Consistent and Identity-Preserving Image-to-Video Generation},
  author={Mingyang Wu and Ashirbad Mishra and Soumik Dey and Shuo Xing and Naveen Ravipati and Hansi Wu and Binbin Li and Zhengzhong Tu},
  year={2026},
  eprint={2602.10113},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2602.10113},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
datasets/proprietary		datasets/proprietary
diffsynth		diffsynth
models/ConsID-Gen/checkpoints		models/ConsID-Gen/checkpoints
src/vggt		src/vggt
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
run_inference_considgen.py		run_inference_considgen.py
run_train_considgen.py		run_train_considgen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConsID-Gen

View-Consistent and Identity-Preserving Image-to-Video Generation

Accepted by CVPR 2026

Updates

Open-Source Plan

Code: ConsID-Gen + Wan2.1 (Inference & Training)

Setup Environment

Model Download

Inference Example

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ConsID-Gen

View-Consistent and Identity-Preserving Image-to-Video Generation

Accepted by CVPR 2026

Updates

Open-Source Plan

Code: ConsID-Gen + Wan2.1 (Inference & Training)

Setup Environment

Model Download

Inference Example

Training

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages