-
Notifications
You must be signed in to change notification settings - Fork 370
Description
I have been trying to run the finetune_hf_llama example. I have the dataset and model downloaded. The model from llama2 is now in hf format. I am trying to covert weights to mega-ds using the tools/hf2megads_weight_converter.py file but I keep getting the error: AttributeError: 'DummyOptim' object has no attribute 'state_dict' from deepspeed runtime bf16_optimizer.py. This error occurs when trying to save checkpoint (last part of convert_ckpt() module).
Command used: deepspeed ../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py --hf-ckpt-num-shards 3 --load-mode auto --hf-ckpt-dir $HF_PATH --save $MGDS_PATH --tensor-model-parallel-size 4 --pipeline-model-parallel-size 2 --lr-warmup-iters 2000 --weight-decay 0.1 --clip-grad 1 --num-layers 32 --hidden-size 4096 --num-attention-heads 32 --ffn-hidden-size 11008 --attention-dropout 0 --hidden-dropout 0 --no-query-key-layer-scaling --disable-bias-linear --normalization rmsnorm --use-rotary-position-embeddings --untie-embeddings-and-output-weights --swiglu --seq-length 512 --max-position-embeddings 512 --micro-batch-size 16 --global-batch-size 256 --train-iters 3500 --lr 2e-5 --tensorboard-dir tensorboard_output --lr-decay-iters 320000 --lr-decay-style cosine --log-interval 1 --eval-iters 100 --eval-interval 100 --data-path $DATA_PATH --save-interval 1500 --split 100,0,0 --bf16 --zero-stage 0 --tokenizer-type HFTokenizer --tokenizer-model $HF_PATH --deepspeed_config $DS_CONFIG --deepspeed --distributed-backend nccl --num-workers 0 --no-masked-softmax-fusion --no-bias-gelu-fusion --no-bias-dropout-fusion --no-gradient-accumulation-fusion --repeated-dataloader
Output snippet with error:
before deepspeed init
after deepspeed init
mega-ds checkpoint will be saved in /fsx/deepspeed/Llama2-7b-mega-ds-T4P2
saving checkpoint at iteration 0 to /fsx/deepspeed/Llama2-7b-mega-ds-T4P2
[rank5]: Traceback (most recent call last):
[rank5]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank5]: convert_ckpt()
[rank5]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank5]: save_checkpoint(0, [ds_engine], None, None)
[rank5]: File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank5]: model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank5]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank5]: self._save_zero_checkpoint(save_dir, tag)
[rank5]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank5]: zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank5]: state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank2]: Traceback (most recent call last):
[rank2]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank2]: convert_ckpt()
[rank2]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank2]: save_checkpoint(0, [ds_engine], None, None)
[rank2]: File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank2]: model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank2]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank2]: self._save_zero_checkpoint(save_dir, tag)
[rank2]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank2]: zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank2]: state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank7]: Traceback (most recent call last):
[rank7]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank7]: convert_ckpt()
[rank7]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank7]: save_checkpoint(0, [ds_engine], None, None)
[rank7]: File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank7]: model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank7]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank7]: self._save_zero_checkpoint(save_dir, tag)
[rank7]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank7]: zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank7]: state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank4]: Traceback (most recent call last):
[rank4]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank4]: convert_ckpt()
[rank4]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank4]: save_checkpoint(0, [ds_engine], None, None)
[rank4]: File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank4]: model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank4]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank4]: self._save_zero_checkpoint(save_dir, tag)
[rank4]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank4]: zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank4]: state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank1]: Traceback (most recent call last):
[rank1]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank1]: convert_ckpt()
[rank1]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank1]: save_checkpoint(0, [ds_engine], None, None)
[rank1]: File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank1]: model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank1]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank1]: self._save_zero_checkpoint(save_dir, tag)
[rank1]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank1]: zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank1]: state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank0]: Traceback (most recent call last):
[rank0]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank0]: convert_ckpt()
[rank0]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank0]: save_checkpoint(0, [ds_engine], None, None)
[rank0]: File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank0]: model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank0]: self._save_zero_checkpoint(save_dir, tag)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank0]: zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank0]: state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank3]: Traceback (most recent call last):
[rank3]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank3]: convert_ckpt()
[rank3]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank3]: save_checkpoint(0, [ds_engine], None, None)
[rank3]: File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank3]: model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank3]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank3]: self._save_zero_checkpoint(save_dir, tag)
[rank3]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank3]: zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank3]: state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank6]: Traceback (most recent call last):
[rank6]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 562, in <module>
[rank6]: convert_ckpt()
[rank6]: File "../Megatron-DeepSpeed/tools/hf2megads_weight_converter.py", line 555, in convert_ckpt
[rank6]: save_checkpoint(0, [ds_engine], None, None)
[rank6]: File "examples_megatron_deepspeed/Megatron-DeepSpeed/megatron/checkpointing.py", line 303, in save_checkpoint
[rank6]: model[0].save_checkpoint(checkpoint_name, client_state=state_dict)
[rank6]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3457, in save_checkpoint
[rank6]: self._save_zero_checkpoint(save_dir, tag)
[rank6]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/engine.py", line 3839, in _save_zero_checkpoint
[rank6]: zero_sd = dict(optimizer_state_dict=self.optimizer.state_dict(), ds_config=self.config, ds_version=version)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/dist-packages/deepspeed/runtime/bf16_optimizer.py", line 470, in state_dict
[rank6]: state_dict[BASE_OPTIMIZER_STATE] = self.optimizer.state_dict()
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: AttributeError: 'DummyOptim' object has no attribute 'state_dict'
[rank2]:[W924 21:50:42.843264906 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W924 21:50:43.855046828 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank3]:[W924 21:50:44.280302802 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank5]:[W924 21:50:44.281271757 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank6]:[W924 21:50:44.352306456 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank4]:[W924 21:50:44.353046354 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank1]:[W924 21:50:44.405097815 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank7]:[W924 21:50:44.479813044 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Versions:
accelerate 1.10.1
deepspeed 0.17.6
torch 2.7.0a0+7c8ec84dab.nv25.3
What is causing the AttributeError: 'DummyOptim' object has no attribute 'state_dict'? How do I solve this to successfully convert to mega-ds and then finetune?