The official codebase for FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation (Accepted to ACL Findings 2026).
Our implementation is based on Difformer with some small modification to the training process. The following command will install the dependencies and this package in a Conda environment (Python 3.9):
conda create -n fastdiss python=3.9 pip=23.0
conda activate fastdiss
pip install -r requirements.txtWe follow the instructions of Fairseq to preprocess the translation datasets. To binarize the distilled and tokenized datasets, run following command (take the IWSLT14 De-En dataset as an example):
fairseq-preprocess \
--source-lang de --target-lang en \
--trainpref {PATH-TO-YOUR-DATASET}/train \
--validpref {PATH-TO-YOUR-DATASET}/valid \
--testpref {PATH-TO-YOUR-DATASET}/test \
--destdir data-bin/iwslt14_de_en \
--workers 20We provide the pre-processed datasets here: Kaggle
All training and evaluation scripts are put in the ./scripts directory. For example, to train Difformer on the IWSLT14 De-En dataset, modify the save path, data path, and simply run:
bash scripts/iwslt14_de_en/train.shWe do not apply checkpoint averaging for evaluation. To evaluate FastDiSS on the IWSLT14 De-En dataset, modify the model path, gen path, and simply run:
bash scripts/iwslt14_de_en/evaluate.shCopyright © 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
All material, including source code and pre-trained models, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This work is heavily built upon the code from: difformer, improved-diffusion, and Fairseq
@inproceedings{nichol2021improved,
title={Improved denoising diffusion probabilistic models},
author={Nichol, Alexander Quinn and Dhariwal, Prafulla},
booktitle={International conference on machine learning},
pages={8162--8171},
year={2021},
organization={PMLR}
}
@article{gao2022difformer,
title={Empowering Diffusion Model on Embedding Space for Text Generation},
author={Gao, Zhujin and Guo, Junliang and Tan, Xu and Zhu, Yongxin and Zhang, Fang and Bian, Jiang and Xu, Linli},
journal={arXiv preprint arXiv:2212.09412},
year={2022}
}
This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.
