optim: split factory registry and strat logics for factory api#6
Open
art-test-stack wants to merge 7 commits into
Open
optim: split factory registry and strat logics for factory api#6art-test-stack wants to merge 7 commits into
art-test-stack wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the optimizer implementation into a registry/strategy architecture, separating optimizer validation, local/distributed execution backends, concrete optimizer strategies, and fused kernels.
Changes:
- Replaces the monolithic optimizer factory with registered optimizer specs and strategy-based AdamW/Muon execution.
- Adds new optimizer kernel modules, including relocated AdamW/Muon kernels and additional Shampoo/Adahessian implementations.
- Updates README optimization documentation to describe the new architecture and extension flow.
Reviewed changes
Copilot reviewed 7 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/gpt_lab/optim/factory.py |
Registers built-in optimizers and delegates stepping to local or distributed backends. |
src/gpt_lab/optim/registry.py |
Adds optimizer spec registry and parameter-group validation. |
src/gpt_lab/optim/strategy.py |
Adds scalar cache, strategy interface, and local/distributed backend orchestration. |
src/gpt_lab/optim/strategies/__init__.py |
Exports concrete optimizer strategies. |
src/gpt_lab/optim/strategies/adamw.py |
Adds AdamW local and distributed strategy implementation. |
src/gpt_lab/optim/strategies/muon.py |
Adds Muon local and distributed strategy implementation. |
src/gpt_lab/optim/kernels/adamw.py |
Adds compiled fused AdamW step kernel. |
src/gpt_lab/optim/kernels/muon.py |
Adds compiled fused Muon step kernel. |
src/gpt_lab/optim/kernels/shampoo.py |
Adds Shampoo optimizer implementation. |
src/gpt_lab/optim/kernels/adahessian.py |
Adds Adahessian optimizer implementation. |
src/gpt_lab/optim/kernels/__init__.py |
Adds optimizer kernels package marker. |
README.md |
Documents the registry/strategy optimizer architecture and usage examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| #### Adding a New Optimizer | ||
|
|
||
| To add a new optimizer (e.g., Aurora), you only need to edit **one place**: |
| The optimizer system uses a **registry-based, strategy-pattern architecture** that decouples optimizer logic from execution mode (single-GPU vs. distributed). Optimizers are built and configured by calling `DenseTransformer.build_optimizer` in the `Trainer` class (available in [`gpt_lab.train.trainer`](./src/gpt_lab/train/trainer.py)) using the optimizer configuration from [`configs/optim.yaml`](./configs/optim.yaml), which can specify mixed optimizer groups. | ||
|
|
||
| > [!WARNING] | ||
| > This is maybe the most critical part of the library, regarding model training, and it is also the part that I have less implemented myself. I used a lot of external repositories for code baseline, and used LLMs back and fourth to enhance it. My goal was to make it work, while being more modular. However, my comprehension of optimization algorithms, coupled with `torch.compile` and distributed training is quite limited. So, I encourage you to check the code in [`gpt_lab.optim.factory`](./src/gpt_lab/optim/factory.py) and the corresponding subfolders for the different optimizers. |
Comment on lines
+505
to
+509
| optimizer: | ||
| - opt: aurora | ||
| lr: 1e-4 | ||
| momentum: 0.9 | ||
| weight_decay: 0.0 |
Owner
Author
|
@tanguyguyot |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.