Skip to content

perf: skip reflect.ValueOf in Marshal() for common types#871

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:mykaul/perf/marshal-reflect-fast-path
Draft

perf: skip reflect.ValueOf in Marshal() for common types#871
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:mykaul/perf/marshal-reflect-fast-path

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented May 2, 2026

Summary

  • Adds a type-switch fast path at the top of Marshal() that skips reflect.ValueOf for common primitive types and their pointer variants
  • The sub-marshalers already handle both pointer and non-pointer types natively (including nil pointer checks), making the reflect-based dereference+recurse path unnecessary for these types
  • For pointer types the improvement is dramatic since the old path allocated an extra interface box via Elem().Interface()

Benchmarks

Type Old (ns/op) New (ns/op) Change Old allocs New allocs
*int64 36.95 15.65 -57.65% (p=0.008) 2 / 16B 1 / 8B
*string 51.78 21.03 -59.39% (p=0.008) 2 / 32B 1 / 16B
bool 13.77 12.96 -5.88% (p=0.008) 1 / 1B 1 / 1B
time.Time 22.22 20.76 -6.57% (p=0.008) 1 / 8B 1 / 8B
geomean 19.95 15.45 -22.55%

For a query with 10 pointer parameters, this saves ~300ns and 10 allocations per execution.

Risk

Low — falls through to existing reflect path for any type not in the fast-path list. All unit tests pass with -race.

Add a type-switch fast path at the top of Marshal() that skips the
reflect.ValueOf pointer-check for common primitive types (and their
pointer variants). The sub-marshalers already handle both pointer and
non-pointer types via their own type-switches, including nil pointer
checks, so the reflect-based dereference+recurse path is unnecessary
for these types.

Benchmark results (pointer types show the largest improvement since
the old path did reflect.ValueOf → Elem().Interface() → recursive
Marshal, allocating an extra interface box):

  *int64:   37ns → 16ns  (-57.6%), 2→1 allocs, 16→8 B/op
  *string:  52ns → 21ns  (-59.4%), 2→1 allocs, 32→16 B/op
  bool:     14ns → 13ns  (-5.9%)
  time.Time: 22ns → 21ns (-6.6%)
  geomean:  -22.6%

For a query with 10 pointer parameters, this saves ~300ns and 10
allocations per query execution.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
@mykaul mykaul marked this pull request as draft May 2, 2026 19:35
@mykaul
Copy link
Copy Markdown
Author

mykaul commented May 2, 2026

CI Failures — unrelated to this PR

Both failures are known pre-existing flakes:

  1. Cassandra 5-LATESTTestConsistencySerial timed out after 120s. Known flaky Cassandra integration test tracked in Fix integration tests that are failing on cassandra #456.

  2. Scylla LTS-PRIORTestSliceMapMapScanTupleTypes failed with "Another global topology request is ongoing, please retry." during TRUNCATE. Scylla-side topology contention flake (no open issue found in this repo).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant