Skip to content

Fix flaky checkpoint continuation: sys__id not preserved when copying partial output table #1681

@ilongin

Description

@ilongin

When a mapper UDF crashes mid-execution and the user re-runs with a fix, _continue_udf copies the parent partial output table into a new table. During this copy, insert_into strips sys__id and auto-generates new ones. If the rows end up with different sys__id values than the originals, things break because:

  • The mapper explicitly preserves input sys__id in its output (1:1 mapping)
  • create_result_query joins the output table with the input table on sys__id
  • calculate_unprocessed_rows uses sys__id to determine which inputs were already processed

When the copied IDs don't match, the join produces wrong result-to-input pairings — values get swapped, duplicated, or lost.

The bug has existed since UDF checkpoints were introduced.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions