Skip to content

[QUESTION] lr schedule(OptimizerParamScheduler) is not aligned when using token and sample as argumentΒ #475

@xrrain

Description

@xrrain

For OptimizerParamScheduler step, it takes two ways to update self.num_tokens and self.num_steps:

if token_num is None:
    args = get_args()
    token_num = args.consumed_train_tokens
self.num_tokens = token_num
self.num_steps += increment

and at get_lr function
it calculates lr by self.num_steps or self.num_tokens by judging whether self.lr_warmup_tokens is None

# Use linear warmup for the initial part.
        if self.lr_warmup_tokens is None:
            if self.lr_warmup_steps > 0 and self.num_steps <= self.lr_warmup_steps:
                if self.num_steps == self.lr_warmup_steps and \
                    self.lr_decay_tokens is not None:
                    # The case of step/sample-wise warmup + token-wise decay
                    self.lr_warmup_tokens = self.num_tokens
                return self.max_lr * float(self.num_steps) / \
                    float(self.lr_warmup_steps)
        else:
            if self.lr_warmup_tokens > 0 and self.num_tokens <= self.lr_warmup_tokens:
                return self.max_lr * float(self.num_tokens) / \
                    float(self.lr_warmup_tokens)

However, args.consumed_train_tokenswill be updated after train step, hence when OptimizerParamScheduler.step() is excuted, the args.consumed_train_tokens has not been updated, then self.num_tokens will not be increased, which cause the second optimizer step will use the old lr. For example, when we do training at begining, the lr used by optimizer will be zero for both first and second steps when we specify self.lr_warmup_tokens

Is this design based on some special considerations or a possible bug?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions