Skip to content

[FLINK-39566][runtime-web] Add checkpoint duration Gantt view#28060

Draft
spuru9 wants to merge 3 commits intoapache:masterfrom
spuru9:feature/checkpoint-gantt
Draft

[FLINK-39566][runtime-web] Add checkpoint duration Gantt view#28060
spuru9 wants to merge 3 commits intoapache:masterfrom
spuru9:feature/checkpoint-gantt

Conversation

@spuru9
Copy link
Copy Markdown
Contributor

@spuru9 spuru9 commented Apr 28, 2026

What is the purpose of the change

Adds a Gantt chart view to the job's Checkpoints tab so operators can see at a glance why a checkpoint was slow. The same information already exists in the per-subtask table, but it's spread across many rows of numbers; this view makes stragglers and stuck phases visually obvious instead of requiring a manual scan.

Brief change log

  • New Gantt tab on the job Checkpoints page (frontend only, no backend changes)
  • Recent Checkpoints strip: last 60 checkpoints as colored bars (completed / savepoint / in-progress / failed); click a bar to drill in
  • Per-checkpoint Gantt: one row per subtask with stacked bars for the four checkpoint phases, sorted by duration so the slowest subtasks rise to the top
  • Pin / Follow newest toggle so a chosen checkpoint stays put for analysis, and PNG export for incident reports
  • Auto-refresh aligned to checkpoint cadence
  • Suppresses the global error toast for the per-checkpoint details endpoint while a checkpoint is still in progress (it returns 404 until ack), so users don't see spurious notifications
image

Verifying this change

This change is frontend-only and is verified manually:

  • Run a Flink cluster with a streaming job that has periodic checkpoints
  • Open the job's Checkpoints tab and switch to the new Gantt view
  • Confirm the recent-checkpoint strip populates and updates as new checkpoints complete
  • Click a bar — the per-subtask Gantt renders for that checkpoint and the view stays pinned
  • Click Follow newest — auto-tracking resumes
  • Click Export PNG — image downloads
  • Trigger a savepoint and a failed checkpoint and confirm they render with the right colors

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no (uses @antv/g2, already a project dependency)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not applicable (UI-only feature, discoverable from the Checkpoints tab)

Was generative AI tooling used to co-author this PR?
  • Yes (Claude Code)

@spuru9 spuru9 changed the title [runtime-web] Add checkpoint duration Gantt view [FLINK-39566][runtime-web] Add checkpoint duration Gantt view Apr 28, 2026
@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Apr 28, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants