Skip to content

eBay/ConsID-Gen

Repository files navigation

ConsID-Gen

View-Consistent and Identity-Preserving Image-to-Video Generation

Mingyang Wu1   Ashirbad Mishra2   Soumik Dey2   Shuo Xing1   Naveen Ravipati2
Hansi Wu2   Binbin Li2   Zhengzhong Tu1

1Texas A&M University    2eBay Inc.
Corresponding Author

Accepted by CVPR 2026

arXiv Model Code Dataset


Updates

  • 2026.03: Our code and dataset are under internal review.
  • 2026.02: Our paper is accepted by CVPR 2026.

Open-Source Plan

We will release resources in stages after internal review. Please stay tuned.

Code: ConsID-Gen + Wan2.1 (Inference & Training)

Setup Environment

conda create -n considgen python=3.10
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

Model Download

Use this section to download the model weights required by run_inference_considgen.py.

# 1) Install Hugging Face CLI
pip install "huggingface_hub[cli]"

# 2) Download Wan2.1-Fun-1.3B-InP
mkdir -p ./models/PAI/Wan2.1-Fun-1.3B-InP
hf download "alibaba-pai/Wan2.1-Fun-1.3B-InP" --local-dir "./models/PAI/Wan2.1-Fun-1.3B-InP"

# 3) Download VGGT-1B
mkdir -p ./models/VGGT-1B
hf download "facebook/VGGT-1B" --local-dir "./models/VGGT-1B"

# 4) Download ConsID-Gen
hf download "mingyang-wu/ConsID-Gen" --local-dir "./models/ConsID-Gen/checkpoints"

Google Drive mirror for ConsID-Gen checkpoints.

Inference Example

Run single-image conditioned generation with a finetuned checkpoint:

python run_inference_considgen.py \
  --input_image_path /path/to/input_image.jpg \
  --image_dir /path/to/multi_view_images_dir \
  --prompt "A product-style close-up video with stable lighting and clean background." \
  --output_dir ./tmp \
  --checkpoint_path models/train/ConsID-Gen/model.safetensors

Training

Prepare the training dataset.

The training metadata should be a JSON list. Each sample needs:

  • video: path to the training video
  • prompt: text description for the video
  • image_list: list of multi-view reference image paths

Example metadata structure:

[
  {
    "dataset": "example_dataset_name",
    "video": "/path/to/video.mp4",
    "prompt": "A short text description of the target video.",
    "image_list": [
      "/path/to/view_1.jpg",
      "/path/to/view_2.jpg"
    ]
  }
]
CUDA_VISIBLE_DEVICES=3 python run_train_considgen.py \
  --dataset_metadata_path ./example_metadata.json \
  --model_paths '["models/PAI/Wan2.1-Fun-1.3B-InP/diffusion_pytorch_model.safetensors","models/PAI/Wan2.1-Fun-1.3B-InP/models_t5_umt5-xxl-enc-bf16.pth","models/PAI/Wan2.1-Fun-1.3B-InP/Wan2.1_VAE.pth","models/PAI/Wan2.1-Fun-1.3B-InP/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth"]' \
  --vggt_model_path models/VGGT-1B \
  --tokenizer_path models/PAI/Wan2.1-Fun-1.3B-InP/google/umt5-xxl \
  --trainable_models dit,considgen_adapter \
  --output_path models/ConsID-Gen/example \
  --num_epochs 1 \
  --dataset_num_workers 0 \
  --wandb_mode disabled

Citation

@misc{wu2026considgenviewconsistentidentitypreservingimagetovideo,
  title={ConsID-Gen: View-Consistent and Identity-Preserving Image-to-Video Generation},
  author={Mingyang Wu and Ashirbad Mishra and Soumik Dey and Shuo Xing and Naveen Ravipati and Hansi Wu and Binbin Li and Zhengzhong Tu},
  year={2026},
  eprint={2602.10113},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2602.10113},
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages