[finetune_hf_llama example] hf2megads_weight_converter.py fails to save due to Attrubute error

I have been trying to run the finetune_hf_llama example. I have the dataset and model downloaded. The model from llama2 is now in hf format. I am trying to covert weights to mega-ds using the tools/hf2megads_weight_converter.py file but I keep getting the error: `AttributeError: 'DummyOptim' object has no attribute 'state_dict'` from deepspeed runtime bf16_optimizer.py. This error occurs when trying to save checkpoint (last part of convert_ckpt() module). 

Command used: `deepspeed ../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py --hf-ckpt-num-shards 3 --load-mode auto --hf-ckpt-dir $HF_PATH --save $MGDS_PATH --tensor-model-parallel-size 4 --pipeline-model-parallel-size 2 --lr-warmup-iters 2000 --weight-decay 0.1 --clip-grad 1 --num-layers 32 --hidden-size 4096 --num-attention-heads 32 --ffn-hidden-size 11008 --attention-dropout 0 --hidden-dropout 0 --no-query-key-layer-scaling --disable-bias-linear --normalization rmsnorm --use-rotary-position-embeddings --untie-embeddings-and-output-weights --swiglu --seq-length 512 --max-position-embeddings 512 --micro-batch-size 16 --global-batch-size 256 --train-iters 3500 --lr 2e-5 --tensorboard-dir tensorboard_output --lr-decay-iters 320000 --lr-decay-style cosine --log-interval 1 --eval-iters 100 --eval-interval 100 --data-path $DATA_PATH --save-interval 1500 --split 100,0,0 --bf16 --zero-stage 0 --tokenizer-type HFTokenizer --tokenizer-model $HF_PATH --deepspeed_config $DS_CONFIG --deepspeed --distributed-backend nccl --num-workers 0 --no-masked-softmax-fusion --no-bias-gelu-fusion --no-bias-dropout-fusion --no-gradient-accumulation-fusion --repeated-dataloader`

Output snippet with error:
```
before deepspeed init
after deepspeed init
mega-ds checkpoint will be saved in /fsx/deepspeed/Llama2-7b-mega-ds-T4P2
saving checkpoint at iteration       0 to /fsx/deepspeed/Llama2-7b-mega-ds-T4P2
[rank5]: Traceback (most recent call last):
[rank5]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank5]:     convert_ckpt()
[rank5]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank5]:     save_checkpoint(0, [ds_engine], None, None)
[rank5]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank5]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank5]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank5]:     self._save_zero_checkpoint(save_dir, tag)
[rank5]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank5]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank5]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank5]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank5]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank2]: Traceback (most recent call last):
[rank2]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank2]:     convert_ckpt()
[rank2]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank2]:     save_checkpoint(0, [ds_engine], None, None)
[rank2]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank2]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank2]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank2]:     self._save_zero_checkpoint(save_dir, tag)
[rank2]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank2]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank2]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank2]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank2]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank7]: Traceback (most recent call last):
[rank7]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank7]:     convert_ckpt()
[rank7]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank7]:     save_checkpoint(0, [ds_engine], None, None)
[rank7]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank7]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank7]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank7]:     self._save_zero_checkpoint(save_dir, tag)
[rank7]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank7]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank7]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank7]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank7]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank4]: Traceback (most recent call last):
[rank4]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank4]:     convert_ckpt()
[rank4]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank4]:     save_checkpoint(0, [ds_engine], None, None)
[rank4]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank4]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank4]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank4]:     self._save_zero_checkpoint(save_dir, tag)
[rank4]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank4]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank4]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank4]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank4]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank1]: Traceback (most recent call last):
[rank1]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank1]:     convert_ckpt()
[rank1]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank1]:     save_checkpoint(0, [ds_engine], None, None)
[rank1]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank1]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank1]:     self._save_zero_checkpoint(save_dir, tag)
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank1]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank1]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank1]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank1]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank0]: Traceback (most recent call last):
[rank0]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank0]:     convert_ckpt()
[rank0]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank0]:     save_checkpoint(0, [ds_engine], None, None)
[rank0]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank0]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank0]:     self._save_zero_checkpoint(save_dir, tag)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank0]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank0]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank0]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank0]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank3]: Traceback (most recent call last):
[rank3]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank3]:     convert_ckpt()
[rank3]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank3]:     save_checkpoint(0, [ds_engine], None, None)
[rank3]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank3]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank3]:     self._save_zero_checkpoint(save_dir, tag)
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank3]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank3]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank3]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank3]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank6]: Traceback (most recent call last):
[rank6]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank6]:     convert_ckpt()
[rank6]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank6]:     save_checkpoint(0, [ds_engine], None, None)
[rank6]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank6]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank6]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank6]:     self._save_zero_checkpoint(save_dir, tag)
[rank6]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank6]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank6]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank6]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank6]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank2]:[W924 21:50:42.843264906 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W924 21:50:43.855046828 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank3]:[W924 21:50:44.280302802 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank5]:[W924 21:50:44.281271757 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank6]:[W924 21:50:44.352306456 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank4]:[W924 21:50:44.353046354 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank1]:[W924 21:50:44.405097815 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank7]:[W924 21:50:44.479813044 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
```

Versions: 
```
accelerate                 1.10.1
deepspeed                  0.17.6
torch                      2.7.0a0+7c8ec84dab.nv25.3
```

What is causing the `AttributeError: 'DummyOptim' object has no attribute 'state_dict'`? How do I solve this to successfully convert to mega-ds and then finetune? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[finetune_hf_llama example] hf2megads_weight_converter.py fails to save due to Attrubute error #482

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[finetune_hf_llama example] hf2megads_weight_converter.py fails to save due to Attrubute error #482

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions