audio transformer

minimal audio generation with codec + transformer. basically a tiny version of what AudioLM/MusicGen do.

what is this

the idea is:

train a codec to compress audio into discrete tokens (like how JPEG compresses images, but learnable and for audio)
train a transformer to predict the next token
generate new audio by sampling from the transformer and decoding

the codec uses residual vector quantization (RVQ) - 8 codebooks with 1024 codes each. at 16khz with 320x downsampling you get ~400 tokens per second of audio.

files

transformer.py - the transformer, has RoPE and GQA
codec.py - encoder/decoder + RVQ, based on encodec/soundstream
audio_data.py - dataloaders for librispeech
train_codec.py - trains the codec
train_audio_lm.py - tokenizes audio then trains transformer on the tokens
inference.py - generates audio
main.py - sanity check that the transformer works (just memorizes a sequence)

usage

pip install torch torchaudio

# train codec (will download librispeech, it's ~6gb)
python train_codec.py --epochs 100

# tokenize the dataset
python train_audio_lm.py --mode tokenize --codec-path ./checkpoints/codec_epoch_100.pt

# train the LM
python train_audio_lm.py --mode train

# generate
python inference.py --mode generate --codec-path ... --transformer-path ...

you probably want a gpu for this. cpu is painfully slow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

audio transformer

what is this

files

usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
audio_data.py		audio_data.py
codec.py		codec.py
inference.py		inference.py
train_audio_lm.py		train_audio_lm.py
train_codec.py		train_codec.py
transformer.py		transformer.py

Folders and files

Latest commit

History

Repository files navigation

audio transformer

what is this

files

usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages