Skip to content

perf/correctness: emit hm (HMAC-256) at sv element level, not b3 (option 1) #205

@coderdan

Description

@coderdan

Summary

Post-PR #196, eql_v2.hash_encrypted is hmac-only and eql_v2."=" is eql_v2.hmac_256(a) = eql_v2.hmac_256(b). This is correct for root-level operations — root payloads carry hm, not b3. But eql_v2."->" and eql_v2.jsonb_path_query_first extract an sv element and re-present it as a root-level eql_v2_encrypted value. sv elements today carry b3 (Blake3 selector-scoped equality) but no hm, so calling =, GROUP BY, DISTINCT, or any hash-strategy operation on the extracted value raises:

ERROR: Cannot hash eql_v2_encrypted value: no hmac_256 index term found.
       Configure a `unique` index on the column for hash operations.

The docs/reference/sql-support.md matrix at the time of writing claims object / array / boolean / null paths support equality (including GROUP BY) via b3 — that documentation no longer matches reality and the gap is exactly this issue.

Why patching = / hash_encrypted to fall back on b3 is not the answer

Tested four shapes that add a b3 fallback inside eql_v2."=" and/or eql_v2.hash_encryptedCASE WHEN has_*(), coalesce(hmac eq, b3 eq), NULL-safe inlinable hmac_256 + coalesce, plpgsql NULL-safe hmac_256 + coalesce. All four fix correctness but every one breaks at least one of the two PostgreSQL planner optimisations that PR #196's body shape unlocks:

  • Functional index match on eql_v2.hmac_256(col) — the structural pattern that bench_text_hmac_idx and equivalents rely on.
  • Merge Join sort-key hoist — the planner extracts eql_v2.hmac_256(e)::text as the Sort key so it sorts on 32-byte hmac strings rather than the full 1.7 KB encrypted JSONB.

Concretely on a 10K-row JSON column with a GIN index, the patched shape v4 (NULL-safe inlinable hmac_256 + coalesce(...) in =) takes self-JOIN from 344 ms → 85,421 ms (250× regression). The other shapes are 30–130× regressions. There's no way to add a b3 fallback to = without inserting a wrapper that defeats the planner's structural pattern matching.

Proposal: emit hm (HMAC-256) instead of b3 (Blake3) at the sv element level

Treat this as a crypto-layer change in @cipherstash/protect / proxy: stop emitting b3 for ste_vec elements and emit hm in its place (or both, during transition).

Why this works

  • Every sv element carries hm after the change. After eql_v2."->" or eql_v2.jsonb_path_query_first extracts an sv element, the resulting eql_v2_encrypted has hm at root. PR perf: flip eql_v2_encrypted infix operator implementations to inlinable SQL (#193) #196's hmac-only = / hash_encrypted then work for field-level GROUP BY, DISTINCT, joins, etc. — no EQL code change needed.
  • The planner's index-match and Merge Join sort-key hoist optimisations stay intact for the hot path.
  • It makes the documentation matrix true again — all JSON node types support equality once a single deterministic index term (hm) is present.

Why the HMAC-vs-Blake3 perf gap is acceptable

On ARM/NEON CPUs HMAC-SHA256 throughput is comparable to Blake3 — see crypto-benches MACs results. On x86 with hardware SHA acceleration the gap is also narrow. The encryption-time cost of switching is small for typical workloads.

Concrete bench result

Built a 10K-row table bench_hm_in_sv with the option-1 shape (sv element carries hm), against a clean PR #196 baseline (no EQL patches):

Operation bench_b3_only (today, b3 in sv) bench_hm_in_sv (option 1)
@> via GIN n/a (works at root) 0.24 ms
WHERE e = $1 single-row works at root 13.57 ms
GROUP BY e root works 124.97 ms
GROUP BY jsonb_path_query_first(e, '<sel>') field RAISES 391 ms
DISTINCT e works 122.42 ms
self-JOIN on root e 344 ms 470 ms

The field-level GROUP BY that raises today runs at 391 ms — same ballpark as field-hm GROUP BY on bench_json. Self-JOIN stays in the 300–500 ms range (Merge Join + hmac sort-key hoist preserved); no catastrophic plan flip.

EQL-side changes (small)

If the protect / proxy side starts emitting hm at sv element level, EQL needs:

  1. eql_v2.ste_vec_contains element comparison: the fix in perf: flip eql_v2_encrypted infix operator implementations to inlinable SQL (#193) #196's review added the has_blake3 → compare_blake3 guard. Replace with has_hmac_256 → compare_hmac_256 (or keep both, preferring hm and falling through to b3 for legacy data). Either way the contract — "selector-scoped equality on encrypted plaintext" — is unchanged.
  2. Backward-compat reads: during the transition window (old data has b3, new data has hm), ste_vec_contains and any other consumer of sv-element equality should accept either term. A coalesce(hm-compare, b3-compare) shape works here (it's an internal call, not the hot-path = operator — no planner regression).
  3. Documentation refresh: update docs/reference/sql-support.md matrix to describe hm as the equality term for all JSON node types (Object, Array, Boolean, Null, String, Number — the latter two already had it via the unique-index pathway). Remove the "b3 supports GROUP BY only for object/array/etc" caveat. Add an upgrade note explaining the field-level GROUP BY contract change.

Migration plan (concrete)

  1. Protect / proxy: start dual-emitting (b3 + hm) at sv elements. Existing readers (ste_vec_contains via b3) keep working; new readers (anything going through = / hash_encrypted post-extraction) prefer hm.
  2. EQL: land the ste_vec_contains change above with the coalesce(hm, b3) shape so it reads either.
  3. After a transition window, protect/proxy stops emitting b3 for new data. Existing data still has b3; that's fine — the EQL coalesce reads it.
  4. (Optional, longer term) drop b3 entirely once all customer data is migrated. Not required.

Verification

In-repo bench coverage for this scenario is already drafted in #203 (companion PR adding GROUP BY / JOIN / DISTINCT bench tests). The bench_json_data.sql fixture overlays hm at the $.hello selector — exactly the option-1 shape. The field-level GROUP BY plan + regression tests pass with PR #196's untouched =, no patches needed.

When this lands, the regression tests' #[ignore = "#202: ..."] markers on the hash-strategy timing assertions stay relevant for the root-level hash_encrypted fast-path (a separate concern), but the field-level test (group_by_jsonb_field_under_threshold) becomes about end-to-end JSON field aggregation perf — which is what we actually want it to measure.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions