Skip to content

Cluster Bus I/O Offload #3361

@hpatro

Description

@hpatro

Problem

Today cluster bus packet semantics still have to run on the main thread because they mutate shared clusterNode / clusterState state that is also read by the command-processing path. However, the main thread should not have to spend the same CPU time on raw network I/O, packet framing, send-buffer writes, or TLS handshake steps.

In larger clusters with heavy gossip, failover traffic, pub/sub propagation, or TLS-enabled links, synchronous cluster bus I/O competes directly with client command processing.

Proposal

Offload cluster bus read, write, and TLS accept work to the existing I/O-thread pool introduced in PR #3324, reusing its shared SPMC request queue and MPSC completion queue. No new threading primitives are introduced.

Read path

  • I/O thread: connRead() into link->rcvbuf, validate packet header/length, and snapshot the complete-packet prefix already present at the front of rcvbuf
  • Main thread: on completion, drain the queued snapshot in place from rcvbuf and compact any unread tail bytes
  • Queued inbound bytes remain in rcvbuf, and receive-side backlog is accounted as part of the existing per-link buffer limit

Write path

  • Keep a single send_msg_queue
  • Main thread appends new outbound messages to send_msg_queue
  • On dispatch, the main thread snapshots:
    • the current head offset
    • the last queue node visible to this write job (io_last_send_block)
  • I/O thread performs connWrite() against the queue only up to the snapshotted boundary
  • Main thread pops fully sent head nodes and updates the head offset on completion
  • Messages appended after dispatch remain queued for a later write job

TLS accept path

  • Accepted cluster connections are marked as cluster-owned before any offload retry path
  • I/O thread performs connAccept() on the bare connection *
  • Main thread completion clears CONN_FLAG_ACCEPT_OFFLOAD_PENDING
  • If the connection is still in CONN_STATE_ACCEPTING, the handshake remains in progress and the next TLS accept step is offloaded again on a later event
  • If the connection reaches a terminal state, the main thread finalizes it via clusterConnAcceptHandler()
  • CONN_FLAG_ACCEPT_OFFLOAD_PENDING serializes accept offload so only one accept job is in flight per connection

Expected Benefits

  • Reduced main-thread CPU time spent on cluster bus syscalls, TLS handshake work
  • Lower client latency under heavy cluster traffic, especially in large clusters
  • Faster and less disruptive inbound TLS cluster handshakes
  • Transparent behavior for users: semantic packet handling stays on the main thread, while raw I/O offloads opportunistically and falls back to synchronous execution if threads are unavailable

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions