ComfyUI custom nodes for
Foundation-1 — Structured Text-to-Sample Diffusion for Music Production
Foundation-1 is a structured text-to-sample diffusion model for music production. It understands instrument identity, timbre, FX, musical notation, BPM, bar count, and key as separate composable controls — enabling precise, predictable synthesis of musical loops.
This ComfyUI wrapper provides native node-based integration with:
- Structured prompting with instrument, timbre, FX, and notation tags
- Tempo-synced generation with BPM and bar count controls
- Key-aware synthesis with full western key support
- Native progress bars and interruption support
Companion Video: Watch the Foundation-1 overview and design philosophy
RoyalCities.mp4
- Structured Text-to-Sample — Generate musical loops from structured text prompts
- Audio-to-Audio Variations — Connect any audio input to create variations/interpretations guided by your prompt
- Tempo-Synced Duration — Automatic duration calculation from BPM and bar count
- 24 Musical Keys — Full western key support (major and minor)
- Native ComfyUI Integration — AUDIO noodle outputs, progress bars, interruption support
- Optimized Performance — Support for SDPA, FlashAttention 2, SageAttention
- Smart Auto-Download — Model weights auto-downloaded from HuggingFace on first use
- Smart Caching — Optional model offloading to CPU RAM between runs
- GPU: NVIDIA GPU with 8GB VRAM minimum (CUDA required)
- Typical VRAM usage: ~7GB during generation
- Generation speed: ~20 it/s (iterations per second) with default sampler
- CPU/MPS: Not supported — Foundation-1 uses Flash Attention which is CUDA-only
- Python: 3.10+
- CUDA: 11.8+
- Flash Attention: Required (comes with PyTorch 2.0+ SDPA)
- SageAttention: Optional but recommended (tested on 2.2.0)
Note
Attention Requirements:
- Minimum: Flash Attention 2 (built into PyTorch 2.0+ SDPA backend)
- Recommended: SageAttention 2.2.0+ for better performance
Important
First Run Requires Internet
- The T5 text encoder (~900MB) is downloaded automatically from HuggingFace on first use
- Model weights (~3GB) are also downloaded on first use
- Subsequent runs work offline once everything is cached
Click to expand installation methods
- Open ComfyUI Manager
- Search for "Foundation-1"
- Click Install
- Restart ComfyUI
cd ComfyUI/custom_nodes
git clone https://github.com/saganaki22/ComfyUI-Foundation-1.git
cd ComfyUI-Foundation-1
python install.pyNote: The
install.pyscript handles all dependency installation. See the Dependency Details section below for what gets installed and why.
If the Manager install button still does nothing after the fixes in v0.1.4, the nuclear option is:
cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/ComfyUI-Foundation-1.git
cd ComfyUI-Foundation-1
python install.pyClick to expand dependency installation details
All dependencies are automatically installed at ComfyUI startup. You do not need to run pip install manually.
These packages are typically already present in ComfyUI environments:
torch
torchaudio # Required for Foundation-1 audio processing
numpy
safetensors
transformers # T5 text encoder
huggingface_hub # Model downloads
If torchaudio is missing for some reason, install it manually:
pip install torchaudioThese packages are installed normally:
einops>=0.7.0
alias-free-torch>=0.0.6
ema-pytorch>=0.2.3
einops-exts>=0.0.3
These packages require special handling:
| Package | Install Command | Reason |
|---|---|---|
stable-audio-tools |
pip install stable-audio-tools --no-deps |
Avoids pandas==2.0.2 which has no Python 3.13 wheel and fails to build from source |
k-diffusion |
pip install k-diffusion==0.1.1 --no-deps --target ./k_diffusion_files/ |
Installed to private directory to avoid conflicts with ComfyUI's bundled k_diffusion and the clip->pkg_resources import chain issue |
[!WARNING] Do NOT run these commands:
pip install stable-audio-tools # WRONG - will pull pandas==2.0.2 pip install k-diffusion # WRONG - will conflict with ComfyUI's versionThese are handled automatically at startup with the correct flags.
sageattention— Install manually for better performance:pip install sageattention
pip install sageattentionTested and working with SageAttention 2.2.0.
-
Add Model Loader
- Add
Foundation-1 Model Loadernode - Model auto-downloads from RoyalCities/Foundation-1 on first use
- Select attention type (auto/sdpa/flash/sageattention)
- Add
-
Add Generator
- Add
Foundation-1 Generatenode - Connect model output from loader
- Enter tags:
Synth Lead, Warm, Bright, Melody - Select BPM, bars, and key
- Add
-
Run!
- Execute the workflow
- Audio output ready for ComfyUI audio nodes
Loads a Foundation-1 checkpoint and prepares it for generation.
Inputs:
| Parameter | Type | Description |
|---|---|---|
model |
dropdown | Foundation-1 checkpoint (auto-downloaded on first run) |
attention |
dropdown | Attention mechanism: auto, sdpa, flash_attention_2, sageattention |
Outputs:
| Output | Type | Description |
|---|---|---|
model |
FOUNDATION1_MODEL | Loaded model for generator node |
Generates a tempo-synced musical loop. Optionally accepts an audio input for variation generation.
Required Inputs:
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
FOUNDATION1_MODEL | — | Connect from Model Loader |
tags |
STRING | Synth Lead, Warm, ... |
Instrument, timbre, FX, notation tags |
bpm |
dropdown | 140 BPM |
Tempo (100-150 BPM options) |
bars |
dropdown | 8 Bars |
Loop length (4 or 8 bars) |
key |
dropdown | E minor |
Musical key (24 options) |
steps |
INT | 250 | Diffusion steps (10-500) |
cfg_scale |
FLOAT | 7.0 | Classifier-free guidance (1.0-15.0) |
seed |
INT | 0 | Generation seed |
sampler_type |
dropdown | dpmpp-3m-sde |
Diffusion sampler |
sigma_min |
FLOAT | 0.3 | Minimum noise level |
sigma_max |
FLOAT | 500.0 | Maximum noise level |
unload_after_generate |
BOOLEAN | False | Offload to CPU RAM after generation |
torch_compile |
BOOLEAN | False | Enable torch.compile (first run slower) |
Optional Inputs (Audio Variation):
| Parameter | Type | Default | Description |
|---|---|---|---|
audio |
AUDIO | None | Input audio for variation — connect from LoadAudio, previous generation, etc. |
init_noise_level |
FLOAT | 0.7 | Variation strength (0.01–1.0). Lower = closer to input, higher = more creative |
Outputs:
| Output | Type | Description |
|---|---|---|
audio |
AUDIO | Generated audio waveform |
How Audio Variation Works
Connect any AUDIO output (e.g., from a LoadAudio node, or a previous Foundation-1 Generate output) to the optional audio input. The model will use this as a starting point and create a variation guided by your prompt tags, BPM, bars, and key.
init_noise_level controls the variation strength:
- 0.1–0.3 — Output stays close to the input audio
- 0.5–0.75 — Balanced musical variations (recommended)
- 0.9–1.0 — Maximum creative freedom, output may differ significantly from input
Leave the audio input disconnected for standard text-to-audio generation.
Click to expand tag reference
Foundation-1 uses structured tags for precise control over generation. Tags should describe:
- Instrument — e.g.,
Synth Lead,Piano,Guitar,Drums - Timbre — e.g.,
Warm,Bright,Dark,Rich,Clean - FX — e.g.,
Reverb,Delay,Distortion,Chorus - Notation — e.g.,
Arp,Chord,Melody,Bassline - Character — e.g.,
Spacey,Intimate,Wide,Thick
Example prompts:
Synth Lead, Warm, Wide, Bright, Clean, Melody
Piano, Soft, Intimate, Reverb, Chord Progression
Drums, Punchy, Tight, Kick, Snare, Hi-Hat
Bass, Deep, Sub, Rolling, Groove
Note: BPM, Bars, and Key are controlled via dropdowns — do not include them in the tags field.
For the complete list of supported tags, see the Master Tag Reference Sheet.
Click to expand tag distribution charts
Click to expand supported keys
Major Keys: C major, C# major, D major, Eb major, E major, F major, F# major, G major, Ab major, A major, Bb major, B major
Minor Keys: C minor, C# minor, D minor, D# minor, E minor, F minor, F# minor, G minor, G# minor, A minor, Bb minor, B minor
Duration is automatically calculated from BPM and bars:
duration (seconds) = round(bars x 4 / BPM x 60)
Examples:
| BPM | Bars | Duration |
|---|---|---|
| 100 | 8 | 19s |
| 120 | 4 | 8s |
| 140 | 8 | 14s |
| 150 | 4 | 6s |
Maximum duration: 20 seconds (model limit)
ComfyUI/
├── models/
│ └── stable_audio/
│ └── Foundation-1/ # Auto-downloaded
│ ├── Foundation_1.safetensors
│ └── model_config.json
└── custom_nodes/
└── ComfyUI-Foundation-1/
├── __init__.py
├── nodes/
│ ├── __init__.py
│ ├── loader_node.py
│ ├── generate_node.py
│ └── model_cache.py
├── k_diffusion_files/ # Private k-diffusion install
├── pyproject.toml
├── requirements.txt
└── README.md
Click to expand parameter details
| Parameter | Description | Recommended |
|---|---|---|
| attention | Attention mechanism | auto (SageAttention if available, else SDPA) |
| steps | Diffusion steps | 250 (training default), 100-150 for faster results |
| cfg_scale | Classifier-free guidance | 7.0 (training default), 6-8 for balance |
| sampler_type | Diffusion sampler | dpmpp-3m-sde (recommended, best quality), k-dpm-fast (fastest, needs fewer steps) |
| sigma_min | Min noise level | 0.3 (default) |
| sigma_max | Max noise level | 500.0 (default) — note: when using audio variation, this is internally overridden by init_noise_level |
| audio | Optional input audio for variations | Connect any AUDIO output, or leave disconnected for text-to-audio |
| init_noise_level | Variation strength | 0.5-0.75 (balanced), 0.1-0.3 (close to input), 1.0 (max variation) |
| unload_after_generate | Offload to CPU RAM | True to free VRAM between runs |
| torch_compile | torch.compile optimization | True (first run slow, subsequent faster) |
Click to expand troubleshooting guide
This means stable-audio-tools was not installed. This can happen if you cloned the repo manually without running install.py, or if your pip environment is different from ComfyUI's.
Fix:
pip install stable-audio-tools --no-depsThen restart ComfyUI.
[!WARNING] You must use
--no-deps. Runningpip install stable-audio-toolswithout it will pull inpandas==2.0.2which breaks on Python 3.13+.
Manually download from RoyalCities/Foundation-1:
pip install -U huggingface_hub
huggingface-cli download RoyalCities/Foundation-1 --local-dir ComfyUI/models/stable_audio/Foundation-1Only these two files are required:
Foundation_1.safetensors(~3GB model weights)model_config.json(model configuration)
The __init__.py auto-installs dependencies at startup. If it fails, install manually:
Normal pip installs:
pip install einops>=0.7.0
pip install alias-free-torch
pip install ema-pytorch
pip install einops-extsSpecial installs with --no-deps (required!):
These packages MUST be installed with --no-deps or they will break your ComfyUI environment:
# stable-audio-tools --no-deps avoids pandas==2.0.2 (no Python 3.13 wheel)
pip install stable-audio-tools --no-deps
# k-diffusion must go to private folder (avoids conflict with ComfyUI's bundled version)
pip install k-diffusion==0.1.1 --no-deps --target ComfyUI/custom_nodes/ComfyUI-Foundation-1/k_diffusion_files/[!WARNING] Do NOT run:
pip install stable-audio-tools # WRONG - pulls pandas==2.0.2 pip install k-diffusion # WRONG - conflicts with ComfyUI
The k_diffusion_files/ folder is created automatically by the auto-installer. It contains a private copy of k-diffusion that's loaded at runtime via importlib — this prevents conflicts with ComfyUI's own bundled k_diffusion and avoids the clip→pkg_resources import chain issue.
If this folder is missing or corrupted, the node will re-download k-diffusion==0.1.1 automatically on next startup.
- Enable
unload_after_generate=Trueto offload to CPU RAM - Reduce
steps(100-150 still gives good results) - Close other GPU applications
- Install SageAttention:
pip install sageattention - Enable
torch_compile=True(first run is slower, subsequent runs faster) - Use
dpmpp-2m-sdesampler (slightly faster thandpmpp-3m-sde)
Foundation-1 installs k-diffusion to a private directory (k_diffusion_files/) to avoid conflicts with ComfyUI's bundled version. Never install k-diffusion to site-packages manually.
- Model: RoyalCities/Foundation-1
- Inference Engine: Stability-AI/stable-audio-tools
- Companion Video: Foundation-1 Overview
This model is licensed under the Stability AI Community License:
- ✅ Non-commercial use — permitted
- ✅ Limited commercial use — entities with annual revenues below USD $1M
⚠️ Revenue exceeding USD $1M — refer to the repository license file for full terms
Model weights from RoyalCities/Foundation-1 are subject to the same license.
Foundation-1 is intended for music production, creative applications, and legitimate purposes. Please use responsibly and ethically. We do not hold any responsibility for any illegal usage. Please refer to your local laws regarding generated content.