MorphFader: Enabling Fine-Grained Semantic Control for Text-to-Audio Morphing through Fader-like Interactions

This project is part of a paper titled "MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models" accepted for publication at ICASSP 2025.

Paper | Demo Webpage | Citation

In this paper, we outline an interactive method to morph sounds generated by two text prompts. We leverage existing pre-trained text-to-audio models. By interpolating between the components of the cross-attention layers of the diffusion models for the two prompts, we show that we can generate smooth and novel interpolated or morphed sounds.

The morphing method outlined in this paper can work with AudioLDM, TANGO, or Stable Audio (or any LDM that uses Cross Attention to induce text-based control). For now, we demonstrate morphing by intercepting and interpolating the attention components of AudioLDM ('audioldm_16k_crossattn_t5'). Morphing using TANGO and Stable Audio models is still Work In Progress. Please check back later.

Setup

Clone this repo
Install dependencies (from the original AudioLDM repo as shown below) by creating a new conda environment called interactive-audio-morphing

conda create -n interactive-audio-morphing python=3.8; conda activate interactive-audio-morphing
cd audio-morphing-with-text #root project dir
git clone https://github.com/pkamath2/audioldm2-formorphing audioldm2
pip3 install -r requirements.txt

Add the newly created environment to Jupyter Notebooks

python -m ipykernel install --user --name interactive-audio-morphing

Notebooks

Notebooks outline how to semantically weight adjective or verb descriptors in the text prompts (see paper). Furthermore, they outline how to morph between two text prompts.

Alternatively, please see our webpage for Google Colab notebooks.

Interfaces

We demonstrate the interactivity of our algorithms by developing two proof-of-concept interfaces using Streamlit

To run the interfaces -

cd interface
streamlit run app.py

You will be able to access the interfaces via browser (preferably use Chrome or Firefox) using the following urls:

For semantic word weighting http://localhost:8501/sound_design_ldm/?app=weight
For audio morphing http://localhost:8501/sound_design_ldm/?app=morph

Citation

If you use this code for your research, please cite as:

@inproceedings{kamath2025morphfader,
  title={MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models},
  author={Kamath, Purnima and Gupta, Chitralekha and Nanayakkara, Suranga},
  booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
interfaces		interfaces
notebooks		notebooks
resources		resources
webpage		webpage
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MorphFader: Enabling Fine-Grained Semantic Control for Text-to-Audio Morphing through Fader-like Interactions

Table of Contents

Setup

Notebooks

Interfaces

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

pkamath2/audio-morphing-with-text

Folders and files

Latest commit

History

Repository files navigation

MorphFader: Enabling Fine-Grained Semantic Control for Text-to-Audio Morphing through Fader-like Interactions

Table of Contents

Setup

Notebooks

Interfaces

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages