MorphFader: Enabling Fine-Grained Semantic Control for Text-to-Audio Morphing through Fader-like Interactions
This project is part of a paper titled "MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models" accepted for publication at ICASSP 2025.
Paper | Demo Webpage | Citation
In this paper, we outline an interactive method to morph sounds generated by two text prompts. We leverage existing pre-trained text-to-audio models. By interpolating between the components of the cross-attention layers of the diffusion models for the two prompts, we show that we can generate smooth and novel interpolated or morphed sounds.
The morphing method outlined in this paper can work with AudioLDM, TANGO, or Stable Audio (or any LDM that uses Cross Attention to induce text-based control). For now, we demonstrate morphing by intercepting and interpolating the attention components of AudioLDM ('audioldm_16k_crossattn_t5'). Morphing using TANGO and Stable Audio models is still Work In Progress. Please check back later.
- Clone this repo
- Install dependencies (from the original AudioLDM repo as shown below) by creating a new conda environment called
interactive-audio-morphing
conda create -n interactive-audio-morphing python=3.8; conda activate interactive-audio-morphing
cd audio-morphing-with-text #root project dir
git clone https://github.com/pkamath2/audioldm2-formorphing audioldm2
pip3 install -r requirements.txt
Add the newly created environment to Jupyter Notebooks
python -m ipykernel install --user --name interactive-audio-morphing
Notebooks outline how to semantically weight adjective or verb descriptors in the text prompts (see paper). Furthermore, they outline how to morph between two text prompts.
- Notebook to demonstrate semantic word weighting
- Notebook to demonstrate audio morphing between text prompts
Alternatively, please see our webpage for Google Colab notebooks.
We demonstrate the interactivity of our algorithms by developing two proof-of-concept interfaces using Streamlit
To run the interfaces -
cd interface
streamlit run app.py
You will be able to access the interfaces via browser (preferably use Chrome or Firefox) using the following urls:
- For semantic word weighting http://localhost:8501/sound_design_ldm/?app=weight
- For audio morphing http://localhost:8501/sound_design_ldm/?app=morph
If you use this code for your research, please cite as:
@inproceedings{kamath2025morphfader,
title={MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models},
author={Kamath, Purnima and Gupta, Chitralekha and Nanayakkara, Suranga},
booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2025},
organization={IEEE}
}
