For the reason of https://github.com/TencentARC/Open-MAGVIT2/issues/17 I turned off the sanity check in the Trainer code to avoid doing validation. The training finished as I got some tfevent log files that look correct, but 1. Validation failed as expected because of https://github.com/TencentARC/Open-MAGVIT2/issues/17 2. I find nothing in my checkpoint path(It's an existing directory on my machine, set here: https://github.com/TencentARC/Open-MAGVIT2/blob/main/configs/imagenet_lfqgan_128_B.yaml#L15). The total step is around 16000. The potential causes I can think of are 1. validation failed and checkpoint saving is part of validation or depends on it 2. Total step not enough for a checkpoint 3. I'm not configuring the checkpoint save path correctly Has anyone encountered this error?
For the reason of #17 I turned off the sanity check in the Trainer code to avoid doing validation.
The training finished as I got some tfevent log files that look correct, but
The potential causes I can think of are
Has anyone encountered this error?