ComfyUI-Foundation-1

ComfyUI-Foundation-1

ComfyUI custom nodes for
Foundation-1 — Structured Text-to-Sample Diffusion for Music Production

Overview

Foundation-1 is a structured text-to-sample diffusion model for music production. It understands instrument identity, timbre, FX, musical notation, BPM, bar count, and key as separate composable controls — enabling precise, predictable synthesis of musical loops.

This ComfyUI wrapper provides native node-based integration with:

Structured prompting with instrument, timbre, FX, and notation tags
Tempo-synced generation with BPM and bar count controls
Key-aware synthesis with full western key support
Native progress bars and interruption support

Companion Video: Watch the Foundation-1 overview and design philosophy

RoyalCities.mp4

Features

Structured Text-to-Sample — Generate musical loops from structured text prompts
Audio-to-Audio Variations — Connect any audio input to create variations/interpretations guided by your prompt
Tempo-Synced Duration — Automatic duration calculation from BPM and bar count
24 Musical Keys — Full western key support (major and minor)
Native ComfyUI Integration — AUDIO noodle outputs, progress bars, interruption support
Optimized Performance — Support for SDPA, FlashAttention 2, SageAttention
Smart Auto-Download — Model weights auto-downloaded from HuggingFace on first use
Smart Caching — Optional model offloading to CPU RAM between runs

Requirements

GPU: NVIDIA GPU with 8GB VRAM minimum (CUDA required)
- Typical VRAM usage: ~7GB during generation
- Generation speed: ~20 it/s (iterations per second) with default sampler
CPU/MPS: Not supported — Foundation-1 uses Flash Attention which is CUDA-only
Python: 3.10+
CUDA: 11.8+
Flash Attention: Required (comes with PyTorch 2.0+ SDPA)
SageAttention: Optional but recommended (tested on 2.2.0)

Note

Attention Requirements:

Minimum: Flash Attention 2 (built into PyTorch 2.0+ SDPA backend)
Recommended: SageAttention 2.2.0+ for better performance

Important

First Run Requires Internet

The T5 text encoder (~900MB) is downloaded automatically from HuggingFace on first use
Model weights (~3GB) are also downloaded on first use
Subsequent runs work offline once everything is cached

Installation

Click to expand installation methods

Method 1: ComfyUI Manager (Recommended)

Open ComfyUI Manager
Search for "Foundation-1"
Click Install
Restart ComfyUI

Method 2: Manual Installation

cd ComfyUI/custom_nodes
git clone https://github.com/saganaki22/ComfyUI-Foundation-1.git
cd ComfyUI-Foundation-1
python install.py

Note: The install.py script handles all dependency installation. See the Dependency Details section below for what gets installed and why.

ComfyUI Manager "Install" Button Not Working?

If the Manager install button still does nothing after the fixes in v0.1.4, the nuclear option is:

cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/ComfyUI-Foundation-1.git
cd ComfyUI-Foundation-1
python install.py

Dependency Details

Click to expand dependency installation details

All dependencies are automatically installed at ComfyUI startup. You do not need to run pip install manually.

Already included in ComfyUI

These packages are typically already present in ComfyUI environments:

torch
torchaudio      # Required for Foundation-1 audio processing
numpy
safetensors
transformers    # T5 text encoder
huggingface_hub # Model downloads

If torchaudio is missing for some reason, install it manually:

pip install torchaudio

Normal pip installs (auto-installed)

These packages are installed normally:

einops>=0.7.0
alias-free-torch>=0.0.6
ema-pytorch>=0.2.3
einops-exts>=0.0.3

Special installs with --no-deps

These packages require special handling:

Package	Install Command	Reason
`stable-audio-tools`	`pip install stable-audio-tools --no-deps`	Avoids `pandas==2.0.2` which has no Python 3.13 wheel and fails to build from source
`k-diffusion`	`pip install k-diffusion==0.1.1 --no-deps --target ./k_diffusion_files/`	Installed to private directory to avoid conflicts with ComfyUI's bundled k_diffusion and the `clip->pkg_resources` import chain issue

What NOT to install manually

[!WARNING] Do NOT run these commands:
pip install stable-audio-tools      # WRONG - will pull pandas==2.0.2
pip install k-diffusion             # WRONG - will conflict with ComfyUI's version
These are handled automatically at startup with the correct flags.

Optional packages

sageattention — Install manually for better performance: pip install sageattention

Installing SageAttention (Recommended)

pip install sageattention

Tested and working with SageAttention 2.2.0.

Quick Start

Basic Workflow

Add Model Loader
- Add Foundation-1 Model Loader node
- Model auto-downloads from RoyalCities/Foundation-1 on first use
- Select attention type (auto/sdpa/flash/sageattention)
Add Generator
- Add Foundation-1 Generate node
- Connect model output from loader
- Enter tags: Synth Lead, Warm, Bright, Melody
- Select BPM, bars, and key
Run!
- Execute the workflow
- Audio output ready for ComfyUI audio nodes

Node Reference

Foundation-1 Model Loader

Loads a Foundation-1 checkpoint and prepares it for generation.

Inputs:

Parameter	Type	Description
`model`	dropdown	Foundation-1 checkpoint (auto-downloaded on first run)
`attention`	dropdown	Attention mechanism: `auto`, `sdpa`, `flash_attention_2`, `sageattention`

Outputs:

Output	Type	Description
`model`	FOUNDATION1_MODEL	Loaded model for generator node

Foundation-1 Generate

Generates a tempo-synced musical loop. Optionally accepts an audio input for variation generation.

Required Inputs:

Parameter	Type	Default	Description
`model`	FOUNDATION1_MODEL	—	Connect from Model Loader
`tags`	STRING	`Synth Lead, Warm, ...`	Instrument, timbre, FX, notation tags
`bpm`	dropdown	`140 BPM`	Tempo (100-150 BPM options)
`bars`	dropdown	`8 Bars`	Loop length (4 or 8 bars)
`key`	dropdown	`E minor`	Musical key (24 options)
`steps`	INT	250	Diffusion steps (10-500)
`cfg_scale`	FLOAT	7.0	Classifier-free guidance (1.0-15.0)
`seed`	INT	0	Generation seed
`sampler_type`	dropdown	`dpmpp-3m-sde`	Diffusion sampler
`sigma_min`	FLOAT	0.3	Minimum noise level
`sigma_max`	FLOAT	500.0	Maximum noise level
`unload_after_generate`	BOOLEAN	False	Offload to CPU RAM after generation
`torch_compile`	BOOLEAN	False	Enable torch.compile (first run slower)

Optional Inputs (Audio Variation):

Parameter	Type	Default	Description
`audio`	AUDIO	None	Input audio for variation — connect from LoadAudio, previous generation, etc.
`init_noise_level`	FLOAT	0.7	Variation strength (0.01–1.0). Lower = closer to input, higher = more creative

Outputs:

Output	Type	Description
`audio`	AUDIO	Generated audio waveform

How Audio Variation Works

Connect any AUDIO output (e.g., from a LoadAudio node, or a previous Foundation-1 Generate output) to the optional audio input. The model will use this as a starting point and create a variation guided by your prompt tags, BPM, bars, and key.

init_noise_level controls the variation strength:

0.1–0.3 — Output stays close to the input audio
0.5–0.75 — Balanced musical variations (recommended)
0.9–1.0 — Maximum creative freedom, output may differ significantly from input

Leave the audio input disconnected for standard text-to-audio generation.

Prompt Tags

Click to expand tag reference

Foundation-1 uses structured tags for precise control over generation. Tags should describe:

Instrument — e.g., Synth Lead, Piano, Guitar, Drums
Timbre — e.g., Warm, Bright, Dark, Rich, Clean
FX — e.g., Reverb, Delay, Distortion, Chorus
Notation — e.g., Arp, Chord, Melody, Bassline
Character — e.g., Spacey, Intimate, Wide, Thick

Example prompts:

Synth Lead, Warm, Wide, Bright, Clean, Melody
Piano, Soft, Intimate, Reverb, Chord Progression
Drums, Punchy, Tight, Kick, Snare, Hi-Hat
Bass, Deep, Sub, Rolling, Groove

Note: BPM, Bars, and Key are controlled via dropdowns — do not include them in the tags field.

📋 Full Tag Reference

For the complete list of supported tags, see the Master Tag Reference Sheet.

Tag Distribution Charts

Click to expand tag distribution charts

Instrument Sub-Family Coverage

Timbre Descriptor Coverage

FX Descriptor Coverage

Musical Keys

Click to expand supported keys

Major Keys: C major, C# major, D major, Eb major, E major, F major, F# major, G major, Ab major, A major, Bb major, B major

Minor Keys: C minor, C# minor, D minor, D# minor, E minor, F minor, F# minor, G minor, G# minor, A minor, Bb minor, B minor

Duration Calculation

Duration is automatically calculated from BPM and bars:

duration (seconds) = round(bars x 4 / BPM x 60)

Examples:

BPM	Bars	Duration
100	8	19s
120	4	8s
140	8	14s
150	4	6s

Maximum duration: 20 seconds (model limit)

File Structure

ComfyUI/
├── models/
│   └── stable_audio/
│       └── Foundation-1/              # Auto-downloaded
│           ├── Foundation_1.safetensors
│           └── model_config.json
└── custom_nodes/
    └── ComfyUI-Foundation-1/
        ├── __init__.py
        ├── nodes/
        │   ├── __init__.py
        │   ├── loader_node.py
        │   ├── generate_node.py
        │   └── model_cache.py
        ├── k_diffusion_files/         # Private k-diffusion install
        ├── pyproject.toml
        ├── requirements.txt
        └── README.md

Parameters Explained

Click to expand parameter details

Parameter	Description	Recommended
attention	Attention mechanism	`auto` (SageAttention if available, else SDPA)
steps	Diffusion steps	`250` (training default), `100-150` for faster results
cfg_scale	Classifier-free guidance	`7.0` (training default), `6-8` for balance
sampler_type	Diffusion sampler	`dpmpp-3m-sde` (recommended, best quality), `k-dpm-fast` (fastest, needs fewer steps)
sigma_min	Min noise level	`0.3` (default)
sigma_max	Max noise level	`500.0` (default) — note: when using audio variation, this is internally overridden by `init_noise_level`
audio	Optional input audio for variations	Connect any AUDIO output, or leave disconnected for text-to-audio
init_noise_level	Variation strength	`0.5-0.75` (balanced), `0.1-0.3` (close to input), `1.0` (max variation)
unload_after_generate	Offload to CPU RAM	`True` to free VRAM between runs
torch_compile	torch.compile optimization	`True` (first run slow, subsequent faster)

Troubleshooting

Click to expand troubleshooting guide

"No module named 'stable_audio_tools'"?

This means stable-audio-tools was not installed. This can happen if you cloned the repo manually without running install.py, or if your pip environment is different from ComfyUI's.

Fix:

pip install stable-audio-tools --no-deps

Then restart ComfyUI.

[!WARNING] You must use --no-deps. Running pip install stable-audio-tools without it will pull in pandas==2.0.2 which breaks on Python 3.13+.

Model Not Downloading?

Manually download from RoyalCities/Foundation-1:

pip install -U huggingface_hub
huggingface-cli download RoyalCities/Foundation-1 --local-dir ComfyUI/models/stable_audio/Foundation-1

Only these two files are required:

Foundation_1.safetensors (~3GB model weights)
model_config.json (model configuration)

Dependency Installation Failed?

The __init__.py auto-installs dependencies at startup. If it fails, install manually:

Normal pip installs:

pip install einops>=0.7.0
pip install alias-free-torch
pip install ema-pytorch
pip install einops-exts

Special installs with --no-deps (required!):

These packages MUST be installed with --no-deps or they will break your ComfyUI environment:

# stable-audio-tools --no-deps avoids pandas==2.0.2 (no Python 3.13 wheel)
pip install stable-audio-tools --no-deps

# k-diffusion must go to private folder (avoids conflict with ComfyUI's bundled version)
pip install k-diffusion==0.1.1 --no-deps --target ComfyUI/custom_nodes/ComfyUI-Foundation-1/k_diffusion_files/

[!WARNING] Do NOT run:

pip install stable-audio-tools    # WRONG - pulls pandas==2.0.2
pip install k-diffusion           # WRONG - conflicts with ComfyUI

What Goes in k_diffusion_files/?

The k_diffusion_files/ folder is created automatically by the auto-installer. It contains a private copy of k-diffusion that's loaded at runtime via importlib — this prevents conflicts with ComfyUI's own bundled k_diffusion and avoids the clip→pkg_resources import chain issue.

If this folder is missing or corrupted, the node will re-download k-diffusion==0.1.1 automatically on next startup.

Out of Memory?

Enable unload_after_generate=True to offload to CPU RAM
Reduce steps (100-150 still gives good results)
Close other GPU applications

Slow Generation?

Install SageAttention: pip install sageattention
Enable torch_compile=True (first run is slower, subsequent runs faster)
Use dpmpp-2m-sde sampler (slightly faster than dpmpp-3m-sde)

k_diffusion Conflicts?

Foundation-1 installs k-diffusion to a private directory (k_diffusion_files/) to avoid conflicts with ComfyUI's bundled version. Never install k-diffusion to site-packages manually.

🔗 Important Links

📄 License

This model is licensed under the Stability AI Community License:

✅ Non-commercial use — permitted
✅ Limited commercial use — entities with annual revenues below USD $1M
⚠️ Revenue exceeding USD $1M — refer to the repository license file for full terms

Model weights from RoyalCities/Foundation-1 are subject to the same license.

⚠️ Usage Disclaimer

Foundation-1 is intended for music production, creative applications, and legitimate purposes. Please use responsibly and ethically. We do not hold any responsibility for any illegal usage. Please refer to your local laws regarding generated content.

Structured Text-to-Sample Diffusion for Music Production

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
example_workflows		example_workflows
nodes		nodes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Master_Tag_Reference.md		Master_Tag_Reference.md
PROMPT_GUIDE.md		PROMPT_GUIDE.md
README.md		README.md
README_ZH.md		README_ZH.md
__init__.py		__init__.py
install.py		install.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-Foundation-1

Overview

Features

Requirements

Installation

Method 1: ComfyUI Manager (Recommended)

Method 2: Manual Installation

ComfyUI Manager "Install" Button Not Working?

Dependency Details

Already included in ComfyUI

Normal pip installs (auto-installed)

Special installs with --no-deps

What NOT to install manually

Optional packages

Installing SageAttention (Recommended)

Quick Start

Basic Workflow

Node Reference

Foundation-1 Model Loader

Foundation-1 Generate

Prompt Tags

📋 Full Tag Reference

Tag Distribution Charts

Musical Keys

Duration Calculation

File Structure

Parameters Explained

Troubleshooting

"No module named 'stable_audio_tools'"?

Model Not Downloading?

Dependency Installation Failed?

What Goes in k_diffusion_files/?

Out of Memory?

Slow Generation?

k_diffusion Conflicts?

🔗 Important Links

🤗 HuggingFace

📄 Code

🌐 Community

📄 License

⚠️ Usage Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Contributors 1

Languages