Conversation
|
I think we should not support this at the engine here especially not at a logical level - if anything this needs to be pushed down into storage. |
42785ef to
54b94b6
Compare
Thanks for the feedback! Updated to track only at storage layer (vectorSelector, matrixSelector) similar to |
|
|
||
| if o.opts.SampleTracker != nil { | ||
| if totalSamplesInBatch > o.lastTrackedSamples { | ||
| o.opts.SampleTracker.Add(totalSamplesInBatch - o.lastTrackedSamples) |
There was a problem hiding this comment.
Could you help me understand why do we need to keep track of the last tracked samples? Doesn't the sample counts get reset at every time we call Next() on the vector selector?
There was a problem hiding this comment.
For vector_selector and subquery, we're tracking a running total of samples, so we need lastTrackedSamples to quickly calculate the delta (just subtract the old value). Otherwise we'd have to loop through everything to recalculate the total each time we check the limit - making it a O(n) operation instead of O(1).
matrix_selector works differently—we calculate the delta for each series as we go (sampleCountAfter - sampleCountBefore) and add it to batchSamplesDelta, so the delta is already there when we need to check.
| buf[currStep].AppendSampleWithSizeHint(series.signature, v, expectedSamples) | ||
| currStepSamples++ | ||
| } | ||
| totalSamplesInBatch += currStepSamples |
There was a problem hiding this comment.
Should we add the check here? Imagine if we have 500k series and step batch of 10. We'll only increment and check the sample count after 500k*10 = 5M samples are loaded?
There was a problem hiding this comment.
Yes makes sense. I have updated now.
Thanks.
|
Could you add unit tests to the PR? |
62c38b1 to
9ac367e
Compare
Signed-off-by: Paurush Garg <paurushg@amazon.com>
Signed-off-by: Paurush Garg <paurushg@amazon.com>
Signed-off-by: Paurush Garg <paurushg@amazon.com>
Signed-off-by: Paurush Garg <paurushg@amazon.com>
Signed-off-by: Paurush Garg <paurushg@amazon.com>
9ac367e to
1a7cf64
Compare
|
Benchmark test results: |
Signed-off-by: Paurush Garg <paurushg@amazon.com>
467b1f7 to
975cce8
Compare
| } | ||
|
|
||
| if o.opts.SampleTracker != nil && (o.currentSeries+1-fromSeries)%maxSamplesCheckIntervalSeries == 0 { | ||
| if err := o.opts.SampleTracker.CheckLimit(); err != nil { |
There was a problem hiding this comment.
Maybe I missed something here. But where do we add samples to the tracker? I only see we add to the samples after looping all series.
There was a problem hiding this comment.
Yes makes sense.
I updated it now.
Thanks.
| if o.opts.SampleTracker != nil && (o.currentSeries+1-firstSeries)%maxSamplesCheckIntervalSeries == 0 { | ||
| totalSamplesInBatch := 0 | ||
| for i := range o.scanners { | ||
| totalSamplesInBatch += o.scanners[i].buffer.SampleCount() | ||
| } |
There was a problem hiding this comment.
It seems that we are counting sample count from all series not just the current series? It seems that we can just maintain a running sum for processed series instead of counting all processed series again
There was a problem hiding this comment.
Yes that would make more sense.
Updated now to count sample for only current series.
Thanks.
execution/scan/subquery.go
Outdated
| lastTrackedSamples int | ||
| } | ||
|
|
||
| const maxSamplesCheckIntervalSteps = 100 |
There was a problem hiding this comment.
I wonder if we have a way to infer a good sample check interval dynamically. Cardinality and number of total steps are known when we call Next(). So ideally we can check more frequently if the query has high cardinality?
There was a problem hiding this comment.
I have moved the checkSampleLimit to inner loop as noted from the run on beta cell, the instant subQueries were getting checked too late. Also added dynamic interval check.
Thanks.
Thanks. |
6ba7bab to
ba9b758
Compare
…_selector maxSamples logic Signed-off-by: Paurush Garg <paurushg@amazon.com>
ba9b758 to
8945224
Compare
harry671003
left a comment
There was a problem hiding this comment.
Thanks! This is great!
Signed-off-by: Paurush Garg <paurushg@amazon.com>
|
@MichaHoffmann requesting review. |
execution/scan/subquery.go
Outdated
| o.collect(vector, mint) | ||
| } | ||
|
|
||
| if o.opts.SampleTracker != nil { |
There was a problem hiding this comment.
I find it strange that we apply this limit in subqueries. They operate on existing samples and do not produce new ones.
There was a problem hiding this comment.
Selectors reuse the same buffer each next, while subqueries accumulate samples in ring buffers that persist across the evaluation window.
For rate(http_requests[10m])[2d:1m], the inner selector loads samples and releases them, but the subquery keeps accumulating those in buffers for the full 2d window.
Signed-off-by: Paurush Garg <paurushg@amazon.com>
|
I reran the fuzz failure and it failed again, it seems to be real, can you dig into it please? |
|
The same NH fuzz test seems failing for me in my PR as well. I wonder if it is just a flake. Someone should take a look and disable if it is flaky |
@MichaHoffmann @yeya24 This matches the known problematic pattern documented here of enginefuzz_test.go: " |
|
|
||
| var ErrNativeHistogramsNotSupported = errors.New("native histograms are not supported in extended range functions") | ||
|
|
||
| const sampleLimitCheckInterval = 500 |
There was a problem hiding this comment.
Why do we have sampleLimitCheckInterval here and sampleLimitCheckPercentage in subquery.go, thats weird
This PR implements
max_sampleslimit enforcement to prevent OOM by tracking samples at the operator level.Changes:
SampleTrackerinquery/package withAdd(),Remove(), andCheckLimit()methodsMaxSamplesfield toEngine.Opts(default 0 = unlimited)ErrMaxSamplesExceedederror for limit violationsTracking Implementation:
Uses delta-based tracking (similar to
peak_samplespattern) in three operators:Operator queries
buffer.SampleCount(), compares to previous count, and tracks the delta usingAdd()/Remove().Benchmark test results: #663 (comment)