[QUESTION]  lr schedule(OptimizerParamScheduler) is not aligned when using token and sample as argument

For [OptimizerParamScheduler step](https://github.com/deepspeedai/Megatron-DeepSpeed/blob/3e1da1fbb226fd4d19aad33afcb33c2f6ed0eb26/megatron/optimizer_param_scheduler.py#L146), it takes two ways to update ``self.num_tokens`` and ``self.num_steps``:
```
if token_num is None:
    args = get_args()
    token_num = args.consumed_train_tokens
self.num_tokens = token_num
self.num_steps += increment
```
and at [get_lr function](https://github.com/deepspeedai/Megatron-DeepSpeed/blob/3e1da1fbb226fd4d19aad33afcb33c2f6ed0eb26/megatron/optimizer_param_scheduler.py#L81)
it calculates lr by ``self.num_steps`` or  ``self.num_tokens`` by judging whether ``self.lr_warmup_tokens is None``
```
# Use linear warmup for the initial part.
        if self.lr_warmup_tokens is None:
            if self.lr_warmup_steps > 0 and self.num_steps <= self.lr_warmup_steps:
                if self.num_steps == self.lr_warmup_steps and \
                    self.lr_decay_tokens is not None:
                    # The case of step/sample-wise warmup + token-wise decay
                    self.lr_warmup_tokens = self.num_tokens
                return self.max_lr * float(self.num_steps) / \
                    float(self.lr_warmup_steps)
        else:
            if self.lr_warmup_tokens > 0 and self.num_tokens <= self.lr_warmup_tokens:
                return self.max_lr * float(self.num_tokens) / \
                    float(self.lr_warmup_tokens)
```
However, ``args.consumed_train_tokens``will be updated after ``train step``, hence when OptimizerParamScheduler.step() is excuted, the ``args.consumed_train_tokens`` has not been updated, then ``self.num_tokens`` will not be increased, which cause the second optimizer step will use the old lr. For example, when we do training at begining, the lr used by optimizer will be **zero** for both first and second steps when we specify self.lr_warmup_tokens

Is this design based on some special considerations or a possible bug?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] lr schedule(OptimizerParamScheduler) is not aligned when using token and sample as argument #475

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] lr schedule(OptimizerParamScheduler) is not aligned when using token and sample as argument #475

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions