In the recent stable diffusion 2.0 (https://github.com/Stability-AI/stablediffusion) and in the huggingface diffusers (https://github.com/huggingface/diffusers), they use the memory efficient attention from xformers (https://github.com/facebookresearch/xformers). Should we try to adopt the same on our diffusion models (or even transformers?)
In the recent stable diffusion 2.0 (https://github.com/Stability-AI/stablediffusion) and in the huggingface diffusers (https://github.com/huggingface/diffusers), they use the memory efficient attention from xformers (https://github.com/facebookresearch/xformers). Should we try to adopt the same on our diffusion models (or even transformers?)