Training loss does not converge

Hello! I sincerely appreciate your open - source efforts.

I'm attempting to reproduce the Dasheng training, but I've run into an issue. The reconstruction loss doesn't seem to decline as the number of epochs increases. I've used the default configuration in the current code and experimented with a smaller dataset. I'm wondering if you've faced this problem before?

I'm also curious about the appropriate threshold for the reconstruction loss. At what value can we consider the training to be successfully completed?

I've tried training with the 1.2B model, and I'm uncertain whether the loss reduction is within the normal range. 




Moreover, when I used the 86M model, it appears that the training didn't converge.

Could you offer some suggestions on how to address these issues?


86M model training：

<img width="710" alt="Image" src="https://github.com/user-attachments/assets/d6b12e48-0dfb-4a3c-ad47-d0f2e9cbd820" />

1.2B model training：

<img width="709" alt="Image" src="https://github.com/user-attachments/assets/0f5a31f8-ccc9-43c8-a6b2-4bdd77757cba" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training loss does not converge #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training loss does not converge #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions