fix(binary): Detect duplicate output series in group_left/group_right#693
fix(binary): Detect duplicate output series in group_left/group_right#693harry671003 wants to merge 1 commit intothanos-io:mainfrom
Conversation
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
There was a problem hiding this comment.
Pull request overview
Adjusts binary vector matching to error (Prometheus-compatible) when group_left/group_right produces duplicate output label sets, instead of silently deduplicating/merging series.
Changes:
- Make
initJoinTablesreturn an error and propagate it during operator initialization. - Extend
joinHelper.append()to report whether a label set was already seen, and use it to detect duplicates for non-one-to-onematching. - Add a new engine test case and ignore a new local folder in
.gitignore.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| execution/binary/vector.go | Detect duplicate output series during join table initialization; propagate errors; extend join helper API. |
| engine/engine_test.go | Adds a scenario around group_right with missing matching labels. |
| .gitignore | Ignores .kiro directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // append adds a label set to the join helper and returns its index. | ||
| // If the label set was already seen (by hash), it returns the existing index | ||
| // and true. Otherwise it appends a new entry and returns false. | ||
| func (h *joinHelper) append(ls labels.Labels) (int, bool) { | ||
| hash := ls.Hash() | ||
| if n, ok := h.seen[hash]; ok { | ||
| return n | ||
| return n, true | ||
| } |
There was a problem hiding this comment.
joinHelper.append() treats a matching labels.Hash() as an identical label set. Hash collisions (even if rare) would incorrectly mark distinct series as duplicates (triggering a false Prometheus error) or incorrectly deduplicate to the wrong index. Consider storing and comparing actual label sets on hash match (e.g., map hash -> list of indices, then labels.Equal(h.ls[idx], ls)), and only returning duplicate=true when labels are truly identical. If you keep hash-only semantics, the doc comment should also make it explicit that this is hash-based and collision-prone.
| num_cpus{pod="nginx-1"} 4+0x40 | ||
| memory_usage_bytes{pod="nginx-1", type="buffers"} 100+10x40 | ||
| memory_usage_bytes{pod="nginx-1", type="cached"} 200+20x40 | ||
| memory_usage_bytes{pod="nginx-1", type="free"} 300+30x40 | ||
| memory_usage_bytes{pod="nginx-1", type="used"} 400+40x40`, | ||
| query: `num_cpus * on(instance, job) group_right(type) memory_usage_bytes`, |
There was a problem hiding this comment.
This new test case doesn't appear to exercise the new behavior described in the PR (erroring when group_left/group_right yields multiple output series with identical labels). Consider adding a dedicated case that produces duplicate output labels and asserts the engine returns multiple matches for labels: grouping labels must ensure unique matches (e.g., use group_right without including a distinguishing label like type, or craft inputs where the grouping labels still collide).
| num_cpus{pod="nginx-1"} 4+0x40 | |
| memory_usage_bytes{pod="nginx-1", type="buffers"} 100+10x40 | |
| memory_usage_bytes{pod="nginx-1", type="cached"} 200+20x40 | |
| memory_usage_bytes{pod="nginx-1", type="free"} 300+30x40 | |
| memory_usage_bytes{pod="nginx-1", type="used"} 400+40x40`, | |
| query: `num_cpus * on(instance, job) group_right(type) memory_usage_bytes`, | |
| num_cpus{pod="nginx-1",instance="localhost:9100",job="node"} 4+0x40 | |
| memory_usage_bytes{pod="nginx-1",instance="localhost:9100",job="node",type="buffers"} 100+10x40 | |
| memory_usage_bytes{pod="nginx-1",instance="localhost:9100",job="node",type="cached"} 200+20x40 | |
| memory_usage_bytes{pod="nginx-1",instance="localhost:9100",job="node",type="free"} 300+30x40 | |
| memory_usage_bytes{pod="nginx-1",instance="localhost:9100",job="node",type="used"} 400+40x40`, | |
| // Multiple right-hand series share the same (instance, job) labels; omitting a | |
| // distinguishing label like "type" in group_right causes multiple matches and | |
| // should trigger: "multiple matches for labels: grouping labels must ensure unique matches". | |
| query: `num_cpus * on(instance, job) group_right memory_usage_bytes`, |
Notes
When group_left or group_right produces multiple result series with identical labels, the engine should error with "grouping labels must ensure unique matches" to match Prometheus behavior. Previously,
joinHelper.append() silently deduplicated output series by hash, causing multiple matched pairs to merge into a single series with duplicate timestamps instead of returning an error.
Make joinHelper.append() report whether a label set was already seen, and check for duplicates in initJoinTables for many-to-one/one-to-many operations.
Same behavior in Prometheus engine: https://github.com/prometheus/prometheus/blob/1906c9877fbe7d33d4bd220e9851c719e1b43156/promql/engine.go#L3049