[Feature Request] Support for DeepSeek-V3.2-Speciale Model Training

Hello Megatron-Bridge Team,

I noticed that the PR supporting [DeepSeek-V3.2-Speciale](https://github.com/NVIDIA/Megatron-LM/pull/2154) training has recently been merged into the dev branch of MegatronLM.

Given the significance of the DeepSeek-V3 series, I was wondering if the Megatron-Bridge team could provide the corresponding recipes and configuration examples for this model?

Having native support in Megatron-Bridge would be greatly appreciated by the community for efficient model adaptation and training.

Thank you for your hard work!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support for DeepSeek-V3.2-Speciale Model Training #1875

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Support for DeepSeek-V3.2-Speciale Model Training #1875

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions