You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Post-PR #196, eql_v2.hash_encrypted is hmac-only and eql_v2."=" is eql_v2.hmac_256(a) = eql_v2.hmac_256(b). This is correct for root-level operations — root payloads carry hm, not b3. But eql_v2."->" and eql_v2.jsonb_path_query_first extract an sv element and re-present it as a root-level eql_v2_encrypted value. sv elements today carry b3 (Blake3 selector-scoped equality) but no hm, so calling =, GROUP BY, DISTINCT, or any hash-strategy operation on the extracted value raises:
ERROR: Cannot hash eql_v2_encrypted value: no hmac_256 index term found.
Configure a `unique` index on the column for hash operations.
The docs/reference/sql-support.md matrix at the time of writing claims object / array / boolean / null paths support equality (including GROUP BY) via b3 — that documentation no longer matches reality and the gap is exactly this issue.
Why patching = / hash_encrypted to fall back on b3 is not the answer
Tested four shapes that add a b3 fallback inside eql_v2."=" and/or eql_v2.hash_encrypted — CASE WHEN has_*(), coalesce(hmac eq, b3 eq), NULL-safe inlinable hmac_256 + coalesce, plpgsql NULL-safe hmac_256 + coalesce. All four fix correctness but every one breaks at least one of the two PostgreSQL planner optimisations that PR #196's body shape unlocks:
Functional index match on eql_v2.hmac_256(col) — the structural pattern that bench_text_hmac_idx and equivalents rely on.
Merge Join sort-key hoist — the planner extracts eql_v2.hmac_256(e)::text as the Sort key so it sorts on 32-byte hmac strings rather than the full 1.7 KB encrypted JSONB.
Concretely on a 10K-row JSON column with a GIN index, the patched shape v4 (NULL-safe inlinable hmac_256 + coalesce(...) in =) takes self-JOIN from 344 ms → 85,421 ms (250× regression). The other shapes are 30–130× regressions. There's no way to add a b3 fallback to = without inserting a wrapper that defeats the planner's structural pattern matching.
Proposal: emit hm (HMAC-256) instead of b3 (Blake3) at the sv element level
Treat this as a crypto-layer change in @cipherstash/protect / proxy: stop emitting b3 for ste_vec elements and emit hm in its place (or both, during transition).
Why this works
Every sv element carries hm after the change. After eql_v2."->" or eql_v2.jsonb_path_query_first extracts an sv element, the resulting eql_v2_encrypted has hm at root. PR perf: flip eql_v2_encrypted infix operator implementations to inlinable SQL (#193) #196's hmac-only = / hash_encrypted then work for field-level GROUP BY, DISTINCT, joins, etc. — no EQL code change needed.
The planner's index-match and Merge Join sort-key hoist optimisations stay intact for the hot path.
It makes the documentation matrix true again — all JSON node types support equality once a single deterministic index term (hm) is present.
Why the HMAC-vs-Blake3 perf gap is acceptable
On ARM/NEON CPUs HMAC-SHA256 throughput is comparable to Blake3 — see crypto-benches MACs results. On x86 with hardware SHA acceleration the gap is also narrow. The encryption-time cost of switching is small for typical workloads.
Concrete bench result
Built a 10K-row table bench_hm_in_sv with the option-1 shape (sv element carries hm), against a clean PR #196 baseline (no EQL patches):
Operation
bench_b3_only (today, b3 in sv)
bench_hm_in_sv (option 1)
@> via GIN
n/a (works at root)
0.24 ms
WHERE e = $1 single-row
works at root
13.57 ms
GROUP BY e root
works
124.97 ms
GROUP BY jsonb_path_query_first(e, '<sel>') field
RAISES
391 ms
DISTINCT e
works
122.42 ms
self-JOIN on root e
344 ms
470 ms
The field-level GROUP BY that raises today runs at 391 ms — same ballpark as field-hm GROUP BY on bench_json. Self-JOIN stays in the 300–500 ms range (Merge Join + hmac sort-key hoist preserved); no catastrophic plan flip.
EQL-side changes (small)
If the protect / proxy side starts emitting hm at sv element level, EQL needs:
eql_v2.ste_vec_contains element comparison: the fix in perf: flip eql_v2_encrypted infix operator implementations to inlinable SQL (#193) #196's review added the has_blake3 → compare_blake3 guard. Replace with has_hmac_256 → compare_hmac_256 (or keep both, preferring hm and falling through to b3 for legacy data). Either way the contract — "selector-scoped equality on encrypted plaintext" — is unchanged.
Backward-compat reads: during the transition window (old data has b3, new data has hm), ste_vec_contains and any other consumer of sv-element equality should accept either term. A coalesce(hm-compare, b3-compare) shape works here (it's an internal call, not the hot-path = operator — no planner regression).
Documentation refresh: update docs/reference/sql-support.md matrix to describe hm as the equality term for all JSON node types (Object, Array, Boolean, Null, String, Number — the latter two already had it via the unique-index pathway). Remove the "b3 supports GROUP BY only for object/array/etc" caveat. Add an upgrade note explaining the field-level GROUP BY contract change.
Migration plan (concrete)
Protect / proxy: start dual-emitting (b3 + hm) at sv elements. Existing readers (ste_vec_contains via b3) keep working; new readers (anything going through = / hash_encrypted post-extraction) prefer hm.
EQL: land the ste_vec_contains change above with the coalesce(hm, b3) shape so it reads either.
After a transition window, protect/proxy stops emitting b3 for new data. Existing data still has b3; that's fine — the EQL coalesce reads it.
(Optional, longer term) drop b3 entirely once all customer data is migrated. Not required.
Verification
In-repo bench coverage for this scenario is already drafted in #203 (companion PR adding GROUP BY / JOIN / DISTINCT bench tests). The bench_json_data.sql fixture overlays hm at the $.hello selector — exactly the option-1 shape. The field-level GROUP BY plan + regression tests pass with PR #196's untouched =, no patches needed.
When this lands, the regression tests' #[ignore = "#202: ..."] markers on the hash-strategy timing assertions stay relevant for the root-level hash_encrypted fast-path (a separate concern), but the field-level test (group_by_jsonb_field_under_threshold) becomes about end-to-end JSON field aggregation perf — which is what we actually want it to measure.
Summary
Post-PR #196,
eql_v2.hash_encryptedis hmac-only andeql_v2."="iseql_v2.hmac_256(a) = eql_v2.hmac_256(b). This is correct for root-level operations — root payloads carryhm, notb3. Buteql_v2."->"andeql_v2.jsonb_path_query_firstextract an sv element and re-present it as a root-leveleql_v2_encryptedvalue. sv elements today carryb3(Blake3 selector-scoped equality) but nohm, so calling=,GROUP BY,DISTINCT, or any hash-strategy operation on the extracted value raises:The
docs/reference/sql-support.mdmatrix at the time of writing claims object / array / boolean / null paths support equality (includingGROUP BY) viab3— that documentation no longer matches reality and the gap is exactly this issue.Why patching
=/hash_encryptedto fall back onb3is not the answerTested four shapes that add a
b3fallback insideeql_v2."="and/oreql_v2.hash_encrypted—CASE WHEN has_*(),coalesce(hmac eq, b3 eq), NULL-safe inlinablehmac_256+coalesce, plpgsql NULL-safehmac_256+coalesce. All four fix correctness but every one breaks at least one of the two PostgreSQL planner optimisations that PR #196's body shape unlocks:eql_v2.hmac_256(col)— the structural pattern that bench_text_hmac_idx and equivalents rely on.eql_v2.hmac_256(e)::textas the Sort key so it sorts on 32-byte hmac strings rather than the full 1.7 KB encrypted JSONB.Concretely on a 10K-row JSON column with a GIN index, the patched shape v4 (NULL-safe inlinable
hmac_256+coalesce(...)in=) takes self-JOIN from 344 ms → 85,421 ms (250× regression). The other shapes are 30–130× regressions. There's no way to add ab3fallback to=without inserting a wrapper that defeats the planner's structural pattern matching.Proposal: emit
hm(HMAC-256) instead ofb3(Blake3) at the sv element levelTreat this as a crypto-layer change in
@cipherstash/protect/ proxy: stop emittingb3for ste_vec elements and emithmin its place (or both, during transition).Why this works
hmafter the change. Aftereql_v2."->"oreql_v2.jsonb_path_query_firstextracts an sv element, the resultingeql_v2_encryptedhashmat root. PR perf: flip eql_v2_encrypted infix operator implementations to inlinable SQL (#193) #196's hmac-only=/hash_encryptedthen work for field-level GROUP BY, DISTINCT, joins, etc. — no EQL code change needed.hm) is present.Why the HMAC-vs-Blake3 perf gap is acceptable
On ARM/NEON CPUs HMAC-SHA256 throughput is comparable to Blake3 — see crypto-benches MACs results. On x86 with hardware SHA acceleration the gap is also narrow. The encryption-time cost of switching is small for typical workloads.
Concrete bench result
Built a 10K-row table
bench_hm_in_svwith the option-1 shape (sv element carrieshm), against a clean PR #196 baseline (no EQL patches):@>via GINWHERE e = $1single-rowGROUP BY erootGROUP BY jsonb_path_query_first(e, '<sel>')fieldDISTINCT eThe field-level GROUP BY that raises today runs at 391 ms — same ballpark as field-hm GROUP BY on bench_json. Self-JOIN stays in the 300–500 ms range (Merge Join + hmac sort-key hoist preserved); no catastrophic plan flip.
EQL-side changes (small)
If the protect / proxy side starts emitting
hmat sv element level, EQL needs:eql_v2.ste_vec_containselement comparison: the fix in perf: flip eql_v2_encrypted infix operator implementations to inlinable SQL (#193) #196's review added thehas_blake3 → compare_blake3guard. Replace withhas_hmac_256 → compare_hmac_256(or keep both, preferringhmand falling through tob3for legacy data). Either way the contract — "selector-scoped equality on encrypted plaintext" — is unchanged.b3, new data hashm),ste_vec_containsand any other consumer of sv-element equality should accept either term. Acoalesce(hm-compare, b3-compare)shape works here (it's an internal call, not the hot-path=operator — no planner regression).docs/reference/sql-support.mdmatrix to describehmas the equality term for all JSON node types (Object, Array, Boolean, Null, String, Number — the latter two already had it via the unique-index pathway). Remove the "b3 supports GROUP BY only for object/array/etc" caveat. Add an upgrade note explaining the field-level GROUP BY contract change.Migration plan (concrete)
b3+hm) at sv elements. Existing readers (ste_vec_contains via b3) keep working; new readers (anything going through=/hash_encryptedpost-extraction) preferhm.ste_vec_containschange above with thecoalesce(hm, b3)shape so it reads either.b3for new data. Existing data still has b3; that's fine — the EQL coalesce reads it.b3entirely once all customer data is migrated. Not required.Verification
In-repo bench coverage for this scenario is already drafted in #203 (companion PR adding GROUP BY / JOIN / DISTINCT bench tests). The
bench_json_data.sqlfixture overlayshmat the$.helloselector — exactly the option-1 shape. The field-level GROUP BY plan + regression tests pass with PR #196's untouched=, no patches needed.When this lands, the regression tests'
#[ignore = "#202: ..."]markers on the hash-strategy timing assertions stay relevant for the root-levelhash_encryptedfast-path (a separate concern), but the field-level test (group_by_jsonb_field_under_threshold) becomes about end-to-end JSON field aggregation perf — which is what we actually want it to measure.Related
like/ilikeIMMUTABLE flip (the original Phase 1 of this theme).~~two-level inlining for bare LIKE.hash_encryptedfast-path on roothm(separate perf concern, root-level).->/jsonb_path_query_firstper-row cost).