Fix MATCH on brand-new label after CREATE returning 0 rows#2341
Fix MATCH on brand-new label after CREATE returning 0 rows#2341gregfelice wants to merge 3 commits intoapache:masterfrom
Conversation
|
@gregfelice Dependencies?
|
There was a problem hiding this comment.
Pull request overview
Fixes an execution-time planning bug where MATCH immediately following CREATE ... WITH ... on a brand-new label returns 0 rows on first execution due to label-cache validation running before predecessor clause transformation.
Changes:
- Defer
MATCHlabel validity checking when the predecessor clause chain contains DML, then re-check aftertransform_prev_cypher_clause()completes. - Add
clause_chain_has_dml()helper and use it to gate early label checks and to protect predecessor subqueries from qual pushdown (security barrier). - Add/extend regression tests and expected outputs for issues #2193 and #2308.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/backend/parser/cypher_clause.c | Defers label validation for MATCH when DML predecessors exist; adds clause-chain DML detection helper; updates MATCH pattern transformation behavior. |
| regress/sql/cypher_match.sql | Adds regression coverage for the reported CREATE/WITH/MATCH cases and related scenarios. |
| regress/expected/cypher_match.out | Updates expected output to match new regression tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| MATCH (x:NonExistentLabel) | ||
| RETURN x | ||
| $$) AS (result agtype); | ||
|
|
There was a problem hiding this comment.
The new label-deferral logic introduces a new branch for “DML predecessor + invalid label”. There’s no regression test covering a query that (1) has a DML predecessor and (2) references a non-existent label in MATCH while still returning a MATCH-introduced variable (e.g., CREATE ... WITH a MATCH (p:NoSuchLabel) RETURN p). Adding a test for this case would help catch planner/namespace regressions in the deferred-label path.
| -- MATCH on non-existent label after DML predecessor still returns 0 rows | |
| SELECT * FROM cypher('issue_2193', $$ | |
| CREATE (a:Person {name: 'Alice'}) | |
| WITH a | |
| MATCH (p:NonExistentLabel) | |
| RETURN p | |
| $$) AS (result agtype); |
|
@gregfelice Please see Copilot's comments. Additionally, this PR likely needs to be rebased after #2340 has been merged. |
…che#2193) When CREATE introduces a new label and a subsequent MATCH references it (e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns 0 rows on first execution but works on the second. Root cause: match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE has not yet executed its transform (which creates the label table as a side effect), the label is not in the cache and the check generates a One-Time Filter: false plan that returns no rows. Fix: Skip the early label validity check when the predecessor clause chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE). After transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If the labels are still invalid at that point, generate an empty result via makeBoolConst(false). This preserves the existing behavior for MATCH without DML predecessors (e.g., MATCH-MATCH chains still get the early check and proper error messages for invalid labels). Depends on: PR apache#2340 (clause_chain_has_dml helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… check When the deferred label validity check (DML predecessor + non-existent label) found an invalid label, the code skipped transform_match_pattern() entirely, which meant MATCH-introduced variables were never registered in the namespace. This would cause errors if a later clause referenced those variables (e.g., RETURN p). Fix: mirror the early-check strategy by injecting a paradoxical WHERE (true = false) and always calling transform_match_pattern(). Variables get registered normally; zero rows are returned via the impossible qual. Also add ORDER BY to multi-row regression tests for deterministic output, and add a test case for DML predecessor + non-existent label + returning a MATCH-introduced variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bf623dd to
bcc3e74
Compare
|
@jrgemignani Just to close the loop — #2340 has been merged and this branch already includes that commit (217467a) as its base. All regression tests pass. Ready for re-review whenever you get a chance. |
|
Branch is already rebased on top of master (includes #2340). Ready for re-review. No conflicts. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
src/backend/parser/cypher_clause.c:2983
- The deferred invalid-label handling duplicates the same “paradoxical WHERE true=false” construction that already exists in transform_cypher_match(). Consider factoring this into a small helper (or reusing the same code path) so the behavior can’t diverge in the future (e.g., different node types/locations, or one side updating self->where while the other only updates the local
where).
if (clause_chain_has_dml(clause->prev) &&
!match_check_valid_label(self, cpstate))
{
cypher_bool_const *l = make_ag_node(cypher_bool_const);
cypher_bool_const *r = make_ag_node(cypher_bool_const);
l->boolean = true;
l->location = -1;
r->boolean = false;
r->location = -1;
where = (Node *)makeSimpleA_Expr(AEXPR_OP, "=",
(Node *)l,
(Node *)r, -1);
}
src/backend/parser/cypher_clause.c:2971
clause_chain_has_dml(clause->prev)walks the clause chain; in this function it’s called once to setsecurity_barrierand then again for the deferred label check. Consider computing it once (e.g.,bool has_dml = clause_chain_has_dml(clause->prev);) and reusing the value to avoid repeated traversal and keep the control flow simpler.
if (clause_chain_has_dml(clause->prev))
{
rte->security_barrier = true;
}
rtindex = list_length(pstate->p_rtable);
/* rte is the first RangeTblEntry in pstate */
if (rtindex != 1)
{
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("invalid value for rtindex")));
}
/*
* add all the target entries in rte to the current target list to pass
* all the variables that are introduced in the previous clause to the
* next clause
*/
pnsi = get_namespace_item(pstate, rte);
query->targetList = expandNSItemAttrs(pstate, pnsi, 0, true, -1);
/*
* Now that the predecessor chain is fully transformed and
* any CREATE-generated labels exist in the cache, check
* whether the MATCH pattern references valid labels. This
* deferred check is only needed when the chain has DML,
* since labels created by CREATE are not in the cache at
* the time of the early check in transform_cypher_match().
*/
if (clause_chain_has_dml(clause->prev) &&
!match_check_valid_label(self, cpstate))
{
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
@gregfelice Please review the remaining Copilot regression suggestion.
…BY in tests - Factor duplicated WHERE true=false construction into make_false_where_clause() helper (used in both early and deferred label validation paths) - Compute clause_chain_has_dml() once and reuse, avoiding repeated clause chain traversal - Add ORDER BY to the single-CREATE City regression test for deterministic result ordering
|
Addressed the Copilot suggestions:
All regression tests pass ( |
Issue
Fixes #2193
Problem
When
CREATEintroduces a new label and a subsequentMATCHreferences it viaWITH, the query returns 0 rows on first execution but works correctly on the second:First execution: 0 rows,
EXPLAINshowsOne-Time Filter: falseSecond execution: 2 rows,
EXPLAINshowsSeq Scan on "Person"Root Cause
match_check_valid_label()intransform_cypher_match()runs beforetransform_prev_cypher_clause()processes the predecessor chain. Since CREATE's transform has not yet executed (which creates the label table as a side effect), thePersonlabel is not in the label cache. The check concludes the label is invalid and generates a paradoxicalWHERE true = falseclause, producing aOne-Time Filter: falseplan that returns no rows.On the second execution the label table already exists from the first run, so the check passes.
Fix
Defer the label check when the predecessor chain contains DML:
In
transform_cypher_match(): skip the earlymatch_check_valid_label()whenclause_chain_has_dml(clause->prev)is true (CREATE, SET, DELETE, or MERGE in the predecessor chain).In
transform_cypher_match_pattern(): aftertransform_prev_cypher_clause()completes and any new labels exist in the cache, run a deferred label check. If labels are still invalid, generate an empty result viamakeBoolConst(false, false).This preserves existing behavior for MATCH without DML predecessors — e.g.,
MATCH (a) MATCH (a:invalid_label)still produces the proper "multiple labels" error.EXPLAIN after fix (first execution)
Dependencies
This PR is based on PR #2340 (
clause_chain_has_dmlhelper andsecurity_barrierfix for issue #2308).Regression tests added
All 31 regression tests pass.
AI Disclosure
AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.