Skip to content

[finetune_hf_llama example] hf2megads_weight_converter.py fails to save due to Attrubute error #482

@aravneelaws

Description

@aravneelaws

I have been trying to run the finetune_hf_llama example. I have the dataset and model downloaded. The model from llama2 is now in hf format. I am trying to covert weights to mega-ds using the tools/hf2megads_weight_converter.py file but I keep getting the error: AttributeError: 'DummyOptim' object has no attribute 'state_dict' from deepspeed runtime bf16_optimizer.py. This error occurs when trying to save checkpoint (last part of convert_ckpt() module).

Command used: deepspeed ../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py --hf-ckpt-num-shards 3 --load-mode auto --hf-ckpt-dir $HF_PATH --save $MGDS_PATH --tensor-model-parallel-size 4 --pipeline-model-parallel-size 2 --lr-warmup-iters 2000 --weight-decay 0.1 --clip-grad 1 --num-layers 32 --hidden-size 4096 --num-attention-heads 32 --ffn-hidden-size 11008 --attention-dropout 0 --hidden-dropout 0 --no-query-key-layer-scaling --disable-bias-linear --normalization rmsnorm --use-rotary-position-embeddings --untie-embeddings-and-output-weights --swiglu --seq-length 512 --max-position-embeddings 512 --micro-batch-size 16 --global-batch-size 256 --train-iters 3500 --lr 2e-5 --tensorboard-dir tensorboard_output --lr-decay-iters 320000 --lr-decay-style cosine --log-interval 1 --eval-iters 100 --eval-interval 100 --data-path $DATA_PATH --save-interval 1500 --split 100,0,0 --bf16 --zero-stage 0 --tokenizer-type HFTokenizer --tokenizer-model $HF_PATH --deepspeed_config $DS_CONFIG --deepspeed --distributed-backend nccl --num-workers 0 --no-masked-softmax-fusion --no-bias-gelu-fusion --no-bias-dropout-fusion --no-gradient-accumulation-fusion --repeated-dataloader

Output snippet with error:

before deepspeed init
after deepspeed init
mega-ds checkpoint will be saved in /fsx/deepspeed/Llama2-7b-mega-ds-T4P2
saving checkpoint at iteration       0 to /fsx/deepspeed/Llama2-7b-mega-ds-T4P2
[rank5]: Traceback (most recent call last):
[rank5]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank5]:     convert_ckpt()
[rank5]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank5]:     save_checkpoint(0, [ds_engine], None, None)
[rank5]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank5]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank5]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank5]:     self._save_zero_checkpoint(save_dir, tag)
[rank5]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank5]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank5]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank5]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank5]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank2]: Traceback (most recent call last):
[rank2]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank2]:     convert_ckpt()
[rank2]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank2]:     save_checkpoint(0, [ds_engine], None, None)
[rank2]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank2]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank2]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank2]:     self._save_zero_checkpoint(save_dir, tag)
[rank2]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank2]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank2]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank2]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank2]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank7]: Traceback (most recent call last):
[rank7]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank7]:     convert_ckpt()
[rank7]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank7]:     save_checkpoint(0, [ds_engine], None, None)
[rank7]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank7]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank7]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank7]:     self._save_zero_checkpoint(save_dir, tag)
[rank7]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank7]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank7]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank7]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank7]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank4]: Traceback (most recent call last):
[rank4]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank4]:     convert_ckpt()
[rank4]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank4]:     save_checkpoint(0, [ds_engine], None, None)
[rank4]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank4]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank4]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank4]:     self._save_zero_checkpoint(save_dir, tag)
[rank4]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank4]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank4]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank4]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank4]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank1]: Traceback (most recent call last):
[rank1]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank1]:     convert_ckpt()
[rank1]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank1]:     save_checkpoint(0, [ds_engine], None, None)
[rank1]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank1]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank1]:     self._save_zero_checkpoint(save_dir, tag)
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank1]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank1]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank1]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank1]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank0]: Traceback (most recent call last):
[rank0]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank0]:     convert_ckpt()
[rank0]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank0]:     save_checkpoint(0, [ds_engine], None, None)
[rank0]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank0]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank0]:     self._save_zero_checkpoint(save_dir, tag)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank0]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank0]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank0]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank0]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank3]: Traceback (most recent call last):
[rank3]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank3]:     convert_ckpt()
[rank3]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank3]:     save_checkpoint(0, [ds_engine], None, None)
[rank3]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank3]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank3]:     self._save_zero_checkpoint(save_dir, tag)
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank3]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank3]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank3]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank3]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank6]: Traceback (most recent call last):
[rank6]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank6]:     convert_ckpt()
[rank6]:   File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank6]:     save_checkpoint(0, [ds_engine], None, None)
[rank6]:   File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank6]:     model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank6]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank6]:     self._save_zero_checkpoint(save_dir, tag)
[rank6]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank6]:     zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank6]:                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank6]:     state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank6]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank2]:[W924 21:50:42.843264906 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W924 21:50:43.855046828 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank3]:[W924 21:50:44.280302802 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank5]:[W924 21:50:44.281271757 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank6]:[W924 21:50:44.352306456 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank4]:[W924 21:50:44.353046354 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank1]:[W924 21:50:44.405097815 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank7]:[W924 21:50:44.479813044 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Versions:

accelerate                 1.10.1
deepspeed                  0.17.6
torch                      2.7.0a0+7c8ec84dab.nv25.3

What is causing the AttributeError: 'DummyOptim' object has no attribute 'state_dict'? How do I solve this to successfully convert to mega-ds and then finetune?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions