Skip to content

feat: Threadpool priority#337

Open
albertoperdomo2 wants to merge 13 commits intollm-d:mainfrom
albertoperdomo2:feat/threadpool-priority
Open

feat: Threadpool priority#337
albertoperdomo2 wants to merge 13 commits intollm-d:mainfrom
albertoperdomo2:feat/threadpool-priority

Conversation

@albertoperdomo2
Copy link
Contributor

@albertoperdomo2 albertoperdomo2 commented Feb 17, 2026

Summary

The FS backend's storage connector uses a single thread pool for both read (GET: Storage->GPU) and write (PUT: GPU->Storage) operations. When write-heavy workloads fill the task queue, read operations (which are latency critical for serving inference requests)get stuck behind writes in a FIFO queue.

This PR introduces a dual-queue with shared workers, so we keep a single pool of workers but maintain two internal queues: a high-priority read queue and a normal-priority write queue. Workers check the read queue first; if empty, they fall through to the write queue.

Other approaches such as separated ThreadPool instances are simpler but double the thread-local resource usage. With this approach, the thread-local resources (staging buffers, CUDA streams...) are not affected since we are only changing how tasks are queued and dequeued, not how they execute.

Test plan

(Edit)

In order to have a sense of how this affects performance, three new test cases were added.

  1. Test that reads submitted after writes complete before them, by submitting 20 writes, wait for the queue to fill up, then submit 5 read operations and verify that reads complete early.
  2. Measure latency distribution under write pressure, by submitting writes continuously (cache evictions) and submitting intersperse reads (cache hits), then measuring the latency distribution.
  3. Test that checks starvation of writes, when a lot of reads are submitted, write should still be processed.

The tests pass showing that prioritization is working as expected.

Related issues

@albertoperdomo2
Copy link
Contributor Author

If resource starvation is a concern (too many reads that writes never run), a simple idea would be to process N writes for every batch of reads.

@dannyharnik
Copy link
Collaborator

Thanks for this contribution! We saw this is a real pain point.
Could you expand this to also avoid write starvation?
Rather than your suggestion (periodically process writes) how about this alternative:
Each of the workers is assigned a preference (read or write). It goes to its preferred queue and only if it is empty it will try the second queue.
We can then assign a configurable number of workers to each preference, likely give a majority to readers first and minority to writers first.

@albertoperdomo2
Copy link
Contributor Author

I need to test it, I'll do it ASAP.

@kfirtoledo
Copy link
Collaborator

@dannyharnik this will be a good idea

@albertoperdomo2
Copy link
Contributor Author

After implementing a WorkerPreference with a default of 75% of workers prioritizing reads and the remaining 25% prioritizing writes, the test "failed":

E       AssertionError: Read time (0.081s) exceeded 30% of write time (0.241s). Ratio: 33.4%. Priority queue may not be working.

but it is still way better than what I saw with main ((0.262s) exceeded 30% of write time (0.312s)). Should I remove the assert threshold and just print out the times? Otherwise it might be a bit difficult in the future to make sure it always passes.

@dannyharnik
Copy link
Collaborator

So if all workers dedicated to reads we get 0.05s and with 75% of the workers it takes 0.081s. This seems reasonable. Can you try with a couple more test points (say 90% readers and 50% readers). If the results make sense then I guess you can remove the assert (or relax it..)

@albertoperdomo2
Copy link
Contributor Author

albertoperdomo2 commented Feb 19, 2026

Results reported from the test:

Read preferring ratio: 50% (4 read-first, 4 write-first workers)
Workload: 5 reads (1 file each) competing with 27 write files
Avg read latency: 0.087s (min: 0.040s, max: 0.111s)
Total write time:  0.332s
PASSED

[...]

Read preferring ratio: 75% (6 read-first, 2 write-first workers)
Workload: 5 reads (1 file each) competing with 27 write files
Avg read latency: 0.068s (min: 0.030s, max: 0.091s)
Total write time:  0.312s
PASSED

[...]

Read preferring ratio: 90% (7 read-first, 1 write-first workers)
Workload: 5 reads (1 file each) competing with 27 write files
Avg read latency: 0.068s (min: 0.030s, max: 0.091s)
Total write time:  0.312s
PASSED

It looks like 75% and 90% perform the same, probably due to the low number of threads and limitations of the current test.

Note

The test case changed completely so it's not possible to compare to the other results discussed up until now.

Copy link
Collaborator

@kfirtoledo kfirtoledo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, some minor comments

Add ThreadPool priority using a dual queue approach with shared workers.

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
@albertoperdomo2 albertoperdomo2 force-pushed the feat/threadpool-priority branch from b506db9 to e2654fa Compare February 25, 2026 07:49
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
kfirtoledo
kfirtoledo previously approved these changes Mar 2, 2026
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
kfirtoledo
kfirtoledo previously approved these changes Mar 2, 2026
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
@albertoperdomo2
Copy link
Contributor Author

I found some missing changes that got lost in the middle of the rebase probably, and added a new test for write starvation as suggested:

======================================================================
Write Starvation Prevention Test Results
======================================================================
Configuration:
  Threads: 4
  Writes submitted: 10
  Reads submitted: 34 (continuous flood)
  Total duration: 0.69s

Write Latencies:
  Min: 0.021s
  Avg: 0.313s
  Max: 0.665s
  Target: <3.0s

Throughput:
  Writes: 14.59 ops/s
  Reads: 49.61 ops/s
  Write percentage: 22.7% (target >15%)
======================================================================

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants