-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hello! I sincerely appreciate your open - source efforts.
I'm attempting to reproduce the Dasheng training, but I've run into an issue. The reconstruction loss doesn't seem to decline as the number of epochs increases. I've used the default configuration in the current code and experimented with a smaller dataset. I'm wondering if you've faced this problem before?
I'm also curious about the appropriate threshold for the reconstruction loss. At what value can we consider the training to be successfully completed?
I've tried training with the 1.2B model, and I'm uncertain whether the loss reduction is within the normal range.
Moreover, when I used the 86M model, it appears that the training didn't converge.
Could you offer some suggestions on how to address these issues?
86M model training:
1.2B model training:
