Skip to content

Training loss does not converge #8

@xiaohutuxian99

Description

@xiaohutuxian99

Hello! I sincerely appreciate your open - source efforts.

I'm attempting to reproduce the Dasheng training, but I've run into an issue. The reconstruction loss doesn't seem to decline as the number of epochs increases. I've used the default configuration in the current code and experimented with a smaller dataset. I'm wondering if you've faced this problem before?

I'm also curious about the appropriate threshold for the reconstruction loss. At what value can we consider the training to be successfully completed?

I've tried training with the 1.2B model, and I'm uncertain whether the loss reduction is within the normal range.

Moreover, when I used the 86M model, it appears that the training didn't converge.

Could you offer some suggestions on how to address these issues?

86M model training:

Image

1.2B model training:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions