Skip to content

Commit 96ea1d1

Browse files
ko3n1gsbak5
andauthored
cp: fix: async_utils: explicit GC in persistent checkpoint worker loop (3591) into core_r0.16.0 (#3628)
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Seonmyeong Bak <sbak@nvidia.com>
1 parent bce706c commit 96ea1d1

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

megatron/core/dist_checkpointing/strategies/async_utils.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -505,6 +505,9 @@ def async_loop(
505505
logger.debug(f"{rank} has completed saving {item.call_idx}")
506506
comp_q.put(item.call_idx)
507507
queue.task_done()
508+
del async_fn_args
509+
del item
510+
gc.collect()
508511

509512
logger.debug(f"PersistentAsyncCaller: persistent ckpt worker for {rank} has terminated")
510513

0 commit comments

Comments
 (0)