Hello Megatron-Bridge Team,
I noticed that the PR supporting DeepSeek-V3.2-Speciale training has recently been merged into the dev branch of MegatronLM.
Given the significance of the DeepSeek-V3 series, I was wondering if the Megatron-Bridge team could provide the corresponding recipes and configuration examples for this model?
Having native support in Megatron-Bridge would be greatly appreciated by the community for efficient model adaptation and training.
Thank you for your hard work!