Skip to content

HTTP AIO dispatcher stalls after initial startup batch (v0.8.2) #1029

@flossypurse

Description

@flossypurse

Summary

The Resonate server v0.8.2 HTTP AIO dispatcher processes __invoke tasks at startup, then stops dispatching new tasks created after startup — indefinitely. The server process stays alive with no errors logged.

Environment

  • Server version: v0.8.2
  • Store: SQLite (default)
  • Deployment: Linux systemd service on Ubuntu
  • Worker type: Serverless (Supabase Edge Functions via HTTP-push)

Steps to Reproduce

  1. Start the Resonate server with auth enabled:
    resonate serve --system-url https://your-server.io --api-auth-public-key /etc/resonate/public_key.pem
    
  2. Create a promise via POST /promises with a resonate:invoke tag pointing to an HTTP function URL (using beginRpc() from @resonatehq/sdk)
  3. Server dispatches the promise and the function executes — works correctly
  4. Restart the server (all pending tasks dispatched from the DB on startup) — works
  5. Create a new promise after startup — stays PENDING indefinitely

Observed Behavior

  • New __invoke tasks are created in the SQLite DB with state=1, expires_at=0, attempt=0
  • The server never attempts to dispatch them (no HTTP call to the resonate:invoke URL)
  • Server logs are completely silent after the initial startup burst (no errors, no warnings)
  • The server responds to API calls (GET /promises, POST /promises) correctly
  • Metrics show coroutines_in_flight{type="EnqueueTasks"} 1 — one EnqueueTasks coroutine is alive but apparently not picking up new tasks

Metrics at Time of Stall

aio_worker_submissions_in_flight{type="sender:http",worker="0"} 1
aio_submissions_in_flight{type="sender:http"} 1
coroutines_in_flight{type="EnqueueTasks"} 1
tasks_total{state="created"} 1

The one in-flight submission on worker 0 corresponds to the last task dispatched at startup. New tasks created after startup are never enqueued to the sender channel.

Workaround

Restarting the server processes new pending tasks again — but the stall recurs after the first batch:

# Check for stuck tasks
sqlite3 /var/lib/resonate/resonate.db "SELECT count(*) FROM tasks WHERE state=1 AND expires_at=0;"

# Clear old stuck tasks (if any)
sqlite3 /var/lib/resonate/resonate.db "UPDATE tasks SET state=8 WHERE state=1 AND expires_at=0 AND created_on < {cutoff_ms};"

# Restart
systemctl restart resonate

Root Cause Hypothesis

The EnqueueTasks coroutine appears to run continuously but stops signaling the AIO HTTP sender after the startup scan. Possible causes:

  1. The AIO submission channel (--aio-sender-size, default 100) is blocked after the startup burst, and the EnqueueTasks coroutine is stuck trying to send to a full channel
  2. A goroutine panic in an HTTP worker is silently swallowed, leaving the sender in a bad state
  3. The EnqueueTasks coroutine uses a one-shot scan at startup rather than a ticker, and new task creation events don't trigger re-evaluation

The --api-http-task-frequency flag (default 1m0s) does not appear to trigger periodic re-scans for new tasks.

Additional Context

  • resonate:invoke HTTP-push mode: beginRpc() sets resonate:invoke: <function-URL> in promise tags. Server should HTTP-POST to that URL on new task dispatch.
  • Pattern was working fine previously — smsAgent workflows consistently resolved. The stall appears to be triggered when a batch of tasks is processed at startup and the final state of the sender channel prevents new enqueues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions