Skip to content

Commit c1b3b95

Browse files
wu-shengclaude
andcommitted
Fix Armeria HTTP event loop sizing and document gRPC/HTTP thread models
Code change: fix shared event loop formula from max(5, cores/4) to min(5, cores). The previous formula scaled poorly — cores/4 gives 1 on 2-core and 6 on 24-core, defeating the intent to cap at 5. min(5, cores) gives 2→2, 4→4, 8+→5, matching the design goal. Documentation: add comprehensive class-level javadoc to both GRPCServer and HTTPServer explaining their thread models, why we keep framework default executor pools on JDK <25 (extensions/handlers may block on long I/O), and how virtual threads replace them on JDK 25+. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 33f529b commit c1b3b95

File tree

3 files changed

+167
-6
lines changed

3 files changed

+167
-6
lines changed

docs/en/changes/changes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656
| L2 Persistence (OAL + MAL) | 3 (DataCarrier) | 4 (BatchQueue) | Unified OAL + MAL |
5757
| TopN Persistence | 4 (DataCarrier) | 1 (BatchQueue) | |
5858
| gRPC Remote Client | 1 (DataCarrier) | 1 (BatchQueue) | Per peer |
59-
| Armeria HTTP event loop | 20 | 5 | `max(5, cores/4)` shared group |
59+
| Armeria HTTP event loop | 20 | 5 | `min(5, cores)` shared group |
6060
| Armeria HTTP handler | on-demand platform(increasing with payload) | - | Virtual threads on JDK 25+ |
6161
| gRPC event loop | 10 | 10 | Unchanged |
6262
| gRPC handler | on-demand platform(increasing with payload)| - | Virtual threads on JDK 25+ |

oap-server/server-library/library-server/src/main/java/org/apache/skywalking/oap/server/library/server/grpc/GRPCServer.java

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,82 @@
4040
import org.apache.skywalking.oap.server.library.server.pool.CustomThreadFactory;
4141
import org.apache.skywalking.oap.server.library.util.VirtualThreads;
4242

43+
/**
44+
* gRPC server backed by Netty. Used by up to 4 OAP server endpoints (core-grpc,
45+
* receiver-grpc, ebpf-grpc, als-grpc). gRPC is the primary telemetry ingestion path.
46+
*
47+
* <h3>Thread model</h3>
48+
* gRPC-netty uses a three-tier thread model:
49+
* <ol>
50+
* <li><b>Boss event loop</b> — 1 thread. Accepts TCP connections, creates Netty channels,
51+
* then hands them off to worker event loop. Shared across all gRPC servers.</li>
52+
* <li><b>Worker event loop</b> — non-blocking I/O multiplexing (HTTP/2 framing, read/write,
53+
* TLS). gRPC defaults to {@code cores} threads (halves Netty's {@code cores * 2}),
54+
* shared across all servers via {@code SharedResourcePool}. Must never block —
55+
* a few threads can serve thousands of connections.</li>
56+
* <li><b>Application executor</b> — where gRPC service methods actually run
57+
* ({@code onMessage}, {@code onHalfClose}, {@code onComplete}). gRPC dispatches
58+
* callbacks from the event loop to this executor via
59+
* {@code JumpToApplicationThreadServerStreamListener}. For streaming RPCs, the
60+
* thread is held only during each individual callback, not for the entire stream —
61+
* between messages the thread returns to the pool.</li>
62+
* </ol>
63+
*
64+
* <h3>Application executor</h3>
65+
* gRPC's default application executor is an <b>unbounded {@code CachedThreadPool}</b>
66+
* ({@code Executors.newCachedThreadPool()}, named {@code grpc-default-executor}).
67+
* gRPC chose this for safety — application code may block (JDBC, file I/O, synchronized),
68+
* and blocking the event loop would freeze all connections. The {@code CachedThreadPool}
69+
* never rejects work but grows unboundedly: each burst creates new threads (expensive),
70+
* idle threads die after 60s, then the next burst creates them again.
71+
*
72+
* <p>While benchmarks show {@code CachedThreadPool} is <b>2x slower</b> than a fixed pool
73+
* (see <a href="https://github.com/grpc/grpc-java/issues/7381">grpc-java#7381</a>),
74+
* we keep the default on JDK &lt;25 because SkyWalking extensions may register gRPC handlers
75+
* that perform long-blocking I/O (on-demand queries, external calls). A bounded pool would
76+
* risk starving other gRPC services. On JDK 25+, virtual threads replace this pool —
77+
* each callback gets its own virtual thread, combining unbounded concurrency with
78+
* minimal resource overhead.
79+
*
80+
* <p>Using {@code directExecutor()} is unsafe for SkyWalking because some handlers call
81+
* {@code BatchQueue.produce()} with {@code BLOCKING} strategy which can block the thread
82+
* — that would freeze the event loop and stall all connections.
83+
*
84+
* <h3>Thread policies</h3>
85+
* <pre>
86+
* gRPC default SkyWalking
87+
* Boss EL: 1, shared (unchanged)
88+
* Worker EL: cores, shared (unchanged)
89+
* App executor: CachedThreadPool (unbounded) JDK 25+: virtual threads
90+
* JDK &lt;25: gRPC default (unchanged)
91+
* </pre>
92+
*
93+
* <h4>Worker event loop: {@code cores}, shared by gRPC (default, unchanged)</h4>
94+
* <pre>
95+
* cores: 2 4 8 10 24
96+
* threads: 2 4 8 10 24
97+
* </pre>
98+
* Non-blocking I/O multiplexing — a few threads handle thousands of connections.
99+
* gRPC's internal {@code SharedResourcePool} already shares one event loop group across
100+
* all {@code NettyServerBuilder} instances that use the default. No custom configuration
101+
* needed.
102+
*
103+
* <h3>Comparison with HTTP (Armeria)</h3>
104+
* <pre>
105+
* gRPC HTTP (Armeria)
106+
* Event loop: cores, shared (gRPC default) min(5, cores), shared
107+
* Handler/blocking: JDK 25+: virtual threads JDK 25+: virtual threads
108+
* JDK &lt;25: CachedThreadPool (default) JDK &lt;25: Armeria default cached pool
109+
* </pre>
110+
* Both gRPC and HTTP keep their framework's default unbounded pool on JDK &lt;25 because
111+
* handlers may block on long I/O (storage queries, extension callbacks). On JDK 25+,
112+
* virtual threads replace both pools.
113+
*
114+
* <h3>User-configured thread pool</h3>
115+
* When {@code threadPoolSize > 0} is set via config, it overrides the default with a
116+
* per-server fixed pool of that size. On JDK 25+ it is ignored — virtual threads
117+
* are always used.
118+
*/
43119
@Slf4j
44120
public class GRPCServer implements Server {
45121

@@ -91,6 +167,16 @@ public GRPCServer(String host, int port, String certChainFile, String privateKey
91167
this.trustedCAsFile = trustedCAsFile;
92168
}
93169

170+
/**
171+
* Build the gRPC server with optional TLS and handler executor.
172+
*
173+
* <p>Handler executor assignment:
174+
* <ul>
175+
* <li>JDK 25+: virtual-thread-per-task executor (ignores threadPoolSize)</li>
176+
* <li>JDK &lt;25, threadPoolSize &gt; 0: per-server fixed pool (legacy config)</li>
177+
* <li>JDK &lt;25, threadPoolSize == 0: gRPC default CachedThreadPool (unbounded)</li>
178+
* </ul>
179+
*/
94180
@Override
95181
public void initialize() {
96182
InetSocketAddress address = new InetSocketAddress(host, port);
@@ -102,6 +188,9 @@ public void initialize() {
102188
if (maxMessageSize > 0) {
103189
nettyServerBuilder.maxInboundMessageSize(maxMessageSize);
104190
}
191+
// JDK 25+: virtual threads for all servers (threadPoolSize ignored)
192+
// JDK <25, threadPoolSize > 0: per-server fixed pool (legacy config override)
193+
// JDK <25, threadPoolSize == 0: gRPC default CachedThreadPool (safe for extensions)
105194
final ExecutorService executor = VirtualThreads.createExecutor(
106195
threadPoolName,
107196
() -> {

oap-server/server-library/library-server/src/main/java/org/apache/skywalking/oap/server/library/server/http/HTTPServer.java

Lines changed: 77 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,18 +46,77 @@
4646

4747
import static java.util.Objects.requireNonNull;
4848

49+
/**
50+
* Armeria-based HTTP server shared by all OAP HTTP endpoints (core-http, receiver-http,
51+
* promql-http, logql-http, zipkin-query-http, zipkin-http, firehose-http — up to 7 servers).
52+
*
53+
* <h3>Thread model</h3>
54+
* Armeria uses a two-tier thread model:
55+
* <ul>
56+
* <li><b>Event loop threads</b> — non-blocking I/O multiplexers (epoll/kqueue). Handle
57+
* connection accept, read/write, and protocol parsing. A few threads can serve
58+
* thousands of connections because they never block.</li>
59+
* <li><b>Blocking task executor threads</b> — where request handlers actually run when
60+
* annotated with {@code @Blocking}. These threads block on storage queries,
61+
* downstream calls, and computation. Each concurrent blocking request occupies
62+
* one thread for its full duration.</li>
63+
* </ul>
64+
*
65+
* The blocking executor needs more threads than the event loop because it's where
66+
* requests spend most of their time (waiting on I/O), while event loop threads just
67+
* shuttle bytes and are immediately available for the next connection.
68+
*
69+
* <h3>Thread policies</h3>
70+
* <pre>
71+
* Armeria default SkyWalking
72+
* Event loop: cores * 2 per server min(5, cores) shared across all servers
73+
* Blocking exec: cached, up to 200 JDK 25+: virtual threads
74+
* JDK &lt;25: Armeria default (unchanged)
75+
* </pre>
76+
*
77+
* <h4>Event loop: {@code min(5, cores)}, shared</h4>
78+
* <pre>
79+
* cores: 2 4 8 10 24
80+
* threads: 2 4 5 5 5
81+
* </pre>
82+
* Armeria's default creates cores*2 event loop threads <em>per server</em>, which for 7
83+
* HTTP servers means 7 * cores * 2 = 140 threads on 10-core — far more than needed for
84+
* HTTP traffic. All servers share one {@link EventLoopGroup} with min(5, cores) threads.
85+
*
86+
* <h4>Blocking executor: Armeria default on JDK &lt;25, virtual threads on JDK 25+</h4>
87+
* On JDK &lt;25, Armeria's default cached pool (up to 200 on-demand threads) is kept
88+
* unchanged. HTTP handlers block on storage/DB queries (BanyanDB, Elasticsearch) which
89+
* can take 10ms–seconds. A bounded pool would cause request queuing and UI timeouts
90+
* when many concurrent queries block simultaneously. The cached pool handles this
91+
* correctly — threads are created on demand and released after idle timeout.
92+
* On JDK 25+, virtual threads replace this pool entirely — each blocking request
93+
* gets its own virtual thread backed by ~cores shared carrier threads.
94+
*
95+
* <h3>Comparison with gRPC</h3>
96+
* gRPC is the primary telemetry ingestion path. HTTP is secondary (UI queries, PromQL,
97+
* LogQL, and optionally telemetry), so it uses fewer event loop threads.
98+
* <pre>
99+
* gRPC HTTP (Armeria)
100+
* Event loop: cores, shared (gRPC default) min(5, cores), shared
101+
* Handler/blocking: JDK 25+: virtual threads JDK 25+: virtual threads
102+
* JDK &lt;25: CachedThreadPool (default) JDK &lt;25: Armeria default cached pool
103+
* </pre>
104+
* Both gRPC and HTTP keep their framework's default unbounded pool on JDK &lt;25 because
105+
* handlers may block on long I/O (storage queries, extension callbacks). On JDK 25+,
106+
* virtual threads replace both pools.
107+
*/
49108
@Slf4j
50109
public class HTTPServer implements Server {
51110
/**
52-
* Shared event loop group for all HTTP servers. HTTP traffic (UI queries,
53-
* PromQL, LogQL) is much lighter than gRPC, so we use a smaller pool
54-
* instead of Armeria's default (availableProcessors * 2).
111+
* Shared event loop group for all HTTP servers.
112+
* Non-blocking I/O multiplexing — min(5, cores) threads can handle thousands
113+
* of connections. Replaces Armeria's default of cores*2 per server.
55114
*/
56115
private static final EventLoopGroup SHARED_WORKER_GROUP;
57116

58117
static {
59-
final int threads = Math.max(5, Runtime.getRuntime().availableProcessors() / 4);
60-
SHARED_WORKER_GROUP = EventLoopGroups.newEventLoopGroup(threads);
118+
final int cores = Runtime.getRuntime().availableProcessors();
119+
SHARED_WORKER_GROUP = EventLoopGroups.newEventLoopGroup(Math.min(5, cores));
61120
}
62121

63122
private final HTTPServerConfig config;
@@ -74,6 +133,16 @@ public void setBlockingTaskName(final String blockingTaskName) {
74133
this.blockingTaskName = blockingTaskName;
75134
}
76135

136+
/**
137+
* Build the Armeria server with shared event loop, TLS, and blocking executor.
138+
*
139+
* <p>Thread pool assignment:
140+
* <ul>
141+
* <li>{@code workerGroup} — shared event loop for I/O (min(5, cores) threads)</li>
142+
* <li>{@code blockingTaskExecutor} — JDK 25+: virtual threads per request;
143+
* JDK &lt;25: Armeria's default cached pool (handlers block on storage queries)</li>
144+
* </ul>
145+
*/
77146
@Override
78147
public void initialize() {
79148
sb = com.linecorp.armeria.server.Server
@@ -115,6 +184,9 @@ public void initialize() {
115184
sb.absoluteUriTransformer(this::transformAbsoluteURI);
116185
}
117186

187+
// JDK 25+: virtual-thread-per-task executor (unbounded, ~cores carrier threads)
188+
// JDK <25: Armeria's default cached pool (up to 200 threads) — kept unchanged
189+
// because HTTP handlers block on long storage queries (10ms-seconds)
118190
if (VirtualThreads.isSupported()) {
119191
final ScheduledExecutorService blockingExecutor = VirtualThreads.createScheduledExecutor(
120192
blockingTaskName, () -> null);

0 commit comments

Comments
 (0)