feat: Threadpool priority by albertoperdomo2 · Pull Request #337 · llm-d/llm-d-kv-cache

albertoperdomo2 · 2026-02-17T19:07:12Z

Summary

The FS backend's storage connector uses a single thread pool for both read (GET: Storage->GPU) and write (PUT: GPU->Storage) operations. When write-heavy workloads fill the task queue, read operations (which are latency critical for serving inference requests)get stuck behind writes in a FIFO queue.

This PR introduces a dual-queue with shared workers, so we keep a single pool of workers but maintain two internal queues: a high-priority read queue and a normal-priority write queue. Workers check the read queue first; if empty, they fall through to the write queue.

Other approaches such as separated ThreadPool instances are simpler but double the thread-local resource usage. With this approach, the thread-local resources (staging buffers, CUDA streams...) are not affected since we are only changing how tasks are queued and dequeued, not how they execute.

Test plan

(Edit)

In order to have a sense of how this affects performance, three new test cases were added.

Test that reads submitted after writes complete before them, by submitting 20 writes, wait for the queue to fill up, then submit 5 read operations and verify that reads complete early.
Measure latency distribution under write pressure, by submitting writes continuously (cache evictions) and submitting intersperse reads (cache hits), then measuring the latency distribution.
Test that checks starvation of writes, when a lot of reads are submitted, write should still be processed.

The tests pass showing that prioritization is working as expected.

Related issues

FS Backend: Create priority in thread pools for reads #286

albertoperdomo2 · 2026-02-17T19:08:25Z

If resource starvation is a concern (too many reads that writes never run), a simple idea would be to process N writes for every batch of reads.

dannyharnik · 2026-02-17T19:42:18Z

Thanks for this contribution! We saw this is a real pain point.
Could you expand this to also avoid write starvation?
Rather than your suggestion (periodically process writes) how about this alternative:
Each of the workers is assigned a preference (read or write). It goes to its preferred queue and only if it is empty it will try the second queue.
We can then assign a configurable number of workers to each preference, likely give a majority to readers first and minority to writers first.

albertoperdomo2 · 2026-02-18T09:40:05Z

I need to test it, I'll do it ASAP.

kfirtoledo · 2026-02-18T10:05:20Z

@dannyharnik this will be a good idea

albertoperdomo2 · 2026-02-18T15:25:33Z

After implementing a WorkerPreference with a default of 75% of workers prioritizing reads and the remaining 25% prioritizing writes, the test "failed":

E       AssertionError: Read time (0.081s) exceeded 30% of write time (0.241s). Ratio: 33.4%. Priority queue may not be working.

but it is still way better than what I saw with main ((0.262s) exceeded 30% of write time (0.312s)). Should I remove the assert threshold and just print out the times? Otherwise it might be a bit difficult in the future to make sure it always passes.

dannyharnik · 2026-02-18T17:56:32Z

So if all workers dedicated to reads we get 0.05s and with 75% of the workers it takes 0.081s. This seems reasonable. Can you try with a couple more test points (say 90% readers and 50% readers). If the results make sense then I guess you can remove the assert (or relax it..)

albertoperdomo2 · 2026-02-19T11:26:04Z

Results reported from the test:

Read preferring ratio: 50% (4 read-first, 4 write-first workers)
Workload: 5 reads (1 file each) competing with 27 write files
Avg read latency: 0.087s (min: 0.040s, max: 0.111s)
Total write time:  0.332s
PASSED

[...]

Read preferring ratio: 75% (6 read-first, 2 write-first workers)
Workload: 5 reads (1 file each) competing with 27 write files
Avg read latency: 0.068s (min: 0.030s, max: 0.091s)
Total write time:  0.312s
PASSED

[...]

Read preferring ratio: 90% (7 read-first, 1 write-first workers)
Workload: 5 reads (1 file each) competing with 27 write files
Avg read latency: 0.068s (min: 0.030s, max: 0.091s)
Total write time:  0.312s
PASSED

It looks like 75% and 90% perform the same, probably due to the low number of threads and limitations of the current test.

Note

The test case changed completely so it's not possible to compare to the other results discussed up until now.

kfirtoledo

overall looks good, some minor comments

kv_connectors/llmd_fs_backend/csrc/storage/thread_pool.cpp

kv_connectors/llmd_fs_backend/csrc/storage/thread_pool.hpp

kv_connectors/llmd_fs_backend/tests/test_fs_backend.py

kv_connectors/llmd_fs_backend/llmd_fs_backend/worker.py

Add ThreadPool priority using a dual queue approach with shared workers. Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

kv_connectors/llmd_fs_backend/tests/test_priority_queue.py

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

kv_connectors/llmd_fs_backend/tests/test_priority_queue.py

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

albertoperdomo2 · 2026-03-02T23:35:12Z

I found some missing changes that got lost in the middle of the rebase probably, and added a new test for write starvation as suggested:

======================================================================
Write Starvation Prevention Test Results
======================================================================
Configuration:
  Threads: 4
  Writes submitted: 10
  Reads submitted: 34 (continuous flood)
  Total duration: 0.69s

Write Latencies:
  Min: 0.021s
  Avg: 0.313s
  Max: 0.665s
  Target: <3.0s

Throughput:
  Writes: 14.59 ops/s
  Reads: 49.61 ops/s
  Write percentage: 22.7% (target >15%)
======================================================================

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

albertoperdomo2 requested review from dannyharnik, kfirtoledo and vMaroon as code owners February 17, 2026 19:07

vMaroon requested review from hyeongyun0916, liu-cong, sagearc and yankay February 17, 2026 19:07

kfirtoledo reviewed Feb 23, 2026

View reviewed changes

kfirtoledo reviewed Feb 24, 2026

View reviewed changes

kv_connectors/llmd_fs_backend/llmd_fs_backend/worker.py Show resolved Hide resolved

albertoperdomo2 added 7 commits February 24, 2026 14:56

feat: Add ThreadPool priority

bed78fd

Add ThreadPool priority using a dual queue approach with shared workers. Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

feat: Add priority queue test case

7d1c083

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

feat: Implement worker preference

502bf6b

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

fix: Typos

e1adefa

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

test: Improve test coverage

e727b8f

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

fix: Minor fixes

8eb1902

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

feat: Make ratio configurable

e2654fa

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

albertoperdomo2 force-pushed the feat/threadpool-priority branch from b506db9 to e2654fa Compare February 25, 2026 07:49

albertoperdomo2 added 3 commits February 25, 2026 08:18

test: Add prioritization test cases (+linter)

bfdb9aa

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

chore: Run linter

980132f

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

test: Revert test file

b045753

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

kfirtoledo reviewed Mar 1, 2026

View reviewed changes

kv_connectors/llmd_fs_backend/tests/test_priority_queue.py Show resolved Hide resolved

kfirtoledo previously approved these changes Mar 2, 2026

View reviewed changes

test: Unify latency test cases and fix pytest dependency

7737041

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

albertoperdomo2 dismissed kfirtoledo’s stale review via 7737041 March 2, 2026 08:08

kfirtoledo previously approved these changes Mar 2, 2026

View reviewed changes

kfirtoledo reviewed Mar 2, 2026

View reviewed changes

kv_connectors/llmd_fs_backend/tests/test_priority_queue.py Show resolved Hide resolved

test: Fix tests and add write starvation test case

6d33a6a

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

albertoperdomo2 dismissed kfirtoledo’s stale review via 6d33a6a March 2, 2026 23:33

test: Adjust write starvation threshold

14cd3da

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Threadpool priority#337

feat: Threadpool priority#337
albertoperdomo2 wants to merge 13 commits intollm-d:mainfrom
albertoperdomo2:feat/threadpool-priority

albertoperdomo2 commented Feb 17, 2026 •

edited

Loading

Uh oh!

albertoperdomo2 commented Feb 17, 2026

Uh oh!

dannyharnik commented Feb 17, 2026

Uh oh!

albertoperdomo2 commented Feb 18, 2026

Uh oh!

kfirtoledo commented Feb 18, 2026

Uh oh!

albertoperdomo2 commented Feb 18, 2026

Uh oh!

dannyharnik commented Feb 18, 2026

Uh oh!

albertoperdomo2 commented Feb 19, 2026 •

edited

Loading

Uh oh!

kfirtoledo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albertoperdomo2 commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertoperdomo2 commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Related issues

Uh oh!

albertoperdomo2 commented Feb 17, 2026

Uh oh!

dannyharnik commented Feb 17, 2026

Uh oh!

albertoperdomo2 commented Feb 18, 2026

Uh oh!

kfirtoledo commented Feb 18, 2026

Uh oh!

albertoperdomo2 commented Feb 18, 2026

Uh oh!

dannyharnik commented Feb 18, 2026

Uh oh!

albertoperdomo2 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfirtoledo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albertoperdomo2 commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertoperdomo2 commented Feb 17, 2026 •

edited

Loading

albertoperdomo2 commented Feb 19, 2026 •

edited

Loading