Skip to content

pkamath2/audio-morphing-with-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MorphFader: Enabling Fine-Grained Semantic Control for Text-to-Audio Morphing through Fader-like Interactions

This project is part of a paper titled "MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models" accepted for publication at ICASSP 2025.

Paper | Demo Webpage | Citation

In this paper, we outline an interactive method to morph sounds generated by two text prompts. We leverage existing pre-trained text-to-audio models. By interpolating between the components of the cross-attention layers of the diffusion models for the two prompts, we show that we can generate smooth and novel interpolated or morphed sounds.

The morphing method outlined in this paper can work with AudioLDM, TANGO, or Stable Audio (or any LDM that uses Cross Attention to induce text-based control). For now, we demonstrate morphing by intercepting and interpolating the attention components of AudioLDM ('audioldm_16k_crossattn_t5'). Morphing using TANGO and Stable Audio models is still Work In Progress. Please check back later.

Table of Contents

Setup

  • Clone this repo
  • Install dependencies (from the original AudioLDM repo as shown below) by creating a new conda environment called interactive-audio-morphing
conda create -n interactive-audio-morphing python=3.8; conda activate interactive-audio-morphing
cd audio-morphing-with-text #root project dir
git clone https://github.com/pkamath2/audioldm2-formorphing audioldm2
pip3 install -r requirements.txt

Add the newly created environment to Jupyter Notebooks

python -m ipykernel install --user --name interactive-audio-morphing

Notebooks

Notebooks outline how to semantically weight adjective or verb descriptors in the text prompts (see paper). Furthermore, they outline how to morph between two text prompts.

Alternatively, please see our webpage for Google Colab notebooks.

Interfaces

We demonstrate the interactivity of our algorithms by developing two proof-of-concept interfaces using Streamlit

To run the interfaces -

cd interface
streamlit run app.py

You will be able to access the interfaces via browser (preferably use Chrome or Firefox) using the following urls:

Citation

If you use this code for your research, please cite as:

@inproceedings{kamath2025morphfader,
  title={MorphFader: Enabling Fine-grained Controllable Morphing with Text-to-Audio Models},
  author={Kamath, Purnima and Gupta, Chitralekha and Nanayakkara, Suranga},
  booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages