Skip to content

optim: split factory registry and strat logics for factory api#6

Open
art-test-stack wants to merge 7 commits into
masterfrom
optim2
Open

optim: split factory registry and strat logics for factory api#6
art-test-stack wants to merge 7 commits into
masterfrom
optim2

Conversation

@art-test-stack

Copy link
Copy Markdown
Owner

No description provided.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the optimizer implementation into a registry/strategy architecture, separating optimizer validation, local/distributed execution backends, concrete optimizer strategies, and fused kernels.

Changes:

  • Replaces the monolithic optimizer factory with registered optimizer specs and strategy-based AdamW/Muon execution.
  • Adds new optimizer kernel modules, including relocated AdamW/Muon kernels and additional Shampoo/Adahessian implementations.
  • Updates README optimization documentation to describe the new architecture and extension flow.

Reviewed changes

Copilot reviewed 7 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/gpt_lab/optim/factory.py Registers built-in optimizers and delegates stepping to local or distributed backends.
src/gpt_lab/optim/registry.py Adds optimizer spec registry and parameter-group validation.
src/gpt_lab/optim/strategy.py Adds scalar cache, strategy interface, and local/distributed backend orchestration.
src/gpt_lab/optim/strategies/__init__.py Exports concrete optimizer strategies.
src/gpt_lab/optim/strategies/adamw.py Adds AdamW local and distributed strategy implementation.
src/gpt_lab/optim/strategies/muon.py Adds Muon local and distributed strategy implementation.
src/gpt_lab/optim/kernels/adamw.py Adds compiled fused AdamW step kernel.
src/gpt_lab/optim/kernels/muon.py Adds compiled fused Muon step kernel.
src/gpt_lab/optim/kernels/shampoo.py Adds Shampoo optimizer implementation.
src/gpt_lab/optim/kernels/adahessian.py Adds Adahessian optimizer implementation.
src/gpt_lab/optim/kernels/__init__.py Adds optimizer kernels package marker.
README.md Documents the registry/strategy optimizer architecture and usage examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md

#### Adding a New Optimizer

To add a new optimizer (e.g., Aurora), you only need to edit **one place**:
Comment thread README.md
The optimizer system uses a **registry-based, strategy-pattern architecture** that decouples optimizer logic from execution mode (single-GPU vs. distributed). Optimizers are built and configured by calling `DenseTransformer.build_optimizer` in the `Trainer` class (available in [`gpt_lab.train.trainer`](./src/gpt_lab/train/trainer.py)) using the optimizer configuration from [`configs/optim.yaml`](./configs/optim.yaml), which can specify mixed optimizer groups.

> [!WARNING]
> This is maybe the most critical part of the library, regarding model training, and it is also the part that I have less implemented myself. I used a lot of external repositories for code baseline, and used LLMs back and fourth to enhance it. My goal was to make it work, while being more modular. However, my comprehension of optimization algorithms, coupled with `torch.compile` and distributed training is quite limited. So, I encourage you to check the code in [`gpt_lab.optim.factory`](./src/gpt_lab/optim/factory.py) and the corresponding subfolders for the different optimizers.
Comment thread README.md
Comment on lines +505 to +509
optimizer:
- opt: aurora
lr: 1e-4
momentum: 0.9
weight_decay: 0.0
@art-test-stack

Copy link
Copy Markdown
Owner Author

@tanguyguyot
review plz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants