Skip to content

Feature(#3816): Make sure MCP pod image is up to date#4331

Open
mchugunov wants to merge 37 commits intoarchestra-ai:mainfrom
mchugunov:feat/3816-mcp-server-probe-pod-image-update
Open

Feature(#3816): Make sure MCP pod image is up to date#4331
mchugunov wants to merge 37 commits intoarchestra-ai:mainfrom
mchugunov:feat/3816-mcp-server-probe-pod-image-update

Conversation

@mchugunov
Copy link
Copy Markdown
Contributor

@mchugunov mchugunov commented May 3, 2026

This PR adds automatic freshness checks for installed local MCP servers.

In scope:

  • Edit MCP server UI shows current update status and handles per-server toggles to enable/disable freshness checks and automatic reinstall
  • Periodic check task to identify the latest available image digest
  • Persist rollout state (success or failure with details)
  • Reinstall follow-up task to ensure successful update

User-facing changes

When editing a local MCP server, user has an option to either enable or disable if the server will be part of the freshness check, as well as to opt in or out of automatic reinstalls once checking task resolves a newer version of the image.

By default, checks and auto-reinstall are off for already existing servers, and on for all the newly created.

In the MCP server configuration modal, user sees information about the check update status, last success / fail, last error, last reinstallation timestamp, and both running and the newest available digests.

HTTP API exposed: new PATCH /api/mcp-server/:id is exposed to support toggle updates

Backend flow

A new task type check_mcp_image_updates is introduced and it handles detection and processing of the incoming updates. By default it runs every 15 minutes, yet the interval can be configured via environment variable.

Only servers that satisfy the following conditions are considered as eligible for freshness check: a server must be local docker-enabled one, the "check for image updates" toggle must be on for this server, it must be in running (installed) state, not marked for manual reinstall, not disabled or in pending or error state.

Freshness checks are safeguarded by database-backed per-server locks. Once a lock is acquired, system spins up a probe pod following current Kubernetes settings (same namespace, scheduling constraints, service account, image pull secrets, etc. as the original MCP server). No matter the result of the probe instantiation, it's cleaned up unconditionally. If the probe manages to start, the system either marks the server as "update available", or proceeds with automatic reinstallation, depending on the "reinstall automatically" toggle value.

Automatic reinstall is not considered complete until a follow-up task checks that rollout managed to start the image with target digest. Depending on the outcome of the follow-up the state may become either up_to_date, rollout_failed, or remain in reinstalling if reinstall still goes on and should be checked again after an exponential backoff timeout.

State machine for the update state
state machine

Flowchart of the update process
Untitled Diagram drawio (1)

mchugunov added 30 commits May 3, 2026 22:17
@CLAassistant
Copy link
Copy Markdown
Contributor

CLAassistant commented May 3, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants