Skip to content

Constant-propagate fields in hash_fields#87

Open
jishnub wants to merge 2 commits intobeacon-biosignals:mainfrom
jishnub:jishnub/hash_fields
Open

Constant-propagate fields in hash_fields#87
jishnub wants to merge 2 commits intobeacon-biosignals:mainfrom
jishnub:jishnub/hash_fields

Conversation

@jishnub
Copy link
Copy Markdown

@jishnub jishnub commented Nov 12, 2025

Description

This PR improves type stability of the field accesses within hash_fields.

julia> struct A
       x::Int64
       b::Float64
       c::String
       end

Descending into hash_fields using Cthulhu

julia> a = A(1,2.0,"c");

julia> @descend stable_hash(a, HashVersion{4}())
[...]
175 function hash_fields(x::A, fields::Core.Const((:b, :c, :x)), hash_state::StableHashTraits.BufferedHashState{SHA.SHA2_256_CTX}, context::Core.Const(HashVersion{4}()))::StableHashTraits.BufferedHashState{SHA.SHA2_256_CTX}
176     vals::Tuple{Float64, String, Int64} = map(field -> getfield(x, field), fields::Core.Const((:b, :c, :x)))::Tuple{Float64, String, Int64}
177     map(fields::Core.Const((:b, :c, :x)), vals::Tuple{Float64, String, Int64}) do field, val
178         # can we optimize away the field's type_hash?
179         transform = transformer(typeof(val), context)
180         if isconcretetype(fieldtype(typeof(x), field)) && transform.hoist_type
181             # the fieldtype has been hashed as part of the type of the container
182             hash_value(val, hash_state, context, transform)
183         else
184             hash_type_and_value(val, hash_state, context)
185         end
186     end
187     return hash_state::StableHashTraits.BufferedHashState{SHA.SHA2_256_CTX}
188 end

Constant propagation ensures that the types of the values are known, and using map applies the function recursively, where the value of field is constant-propagated within the inner function. This ensures that the inner fieldtype call is also type-inferred.

Benchmarks

Before

16×6 DataFrame
 Row │ benchmark   hash       version    base        trait       ratio
     │ SubStrin   SubStrin  SubStrin  String      String      Float64
─────┼─────────────────────────────────────────────────────────────────────
   1 │ dicts       crc        4          1.627 ms    119.251 ms  73.3162
   2 │ structs     crc        4          16.639 μs   790.094 μs  47.4845
   3 │ tuples      crc        4          16.957 μs   506.097 μs  29.8459
   4 │ numbers     crc        4          7.133 μs    210.804 μs  29.5533
   5 │ dataframes  crc        4          27.112 μs   444.912 μs  16.4102
   6 │ symbols     crc        4          1.151 ms    1.670 ms     1.45029
   7 │ missings    crc        4          307.279 μs  338.958 μs   1.1031
   8 │ strings     crc        4          1.254 ms    331.755 μs   0.264603
   9 │ dicts       sha256     4          2.302 ms    161.438 ms  70.1321
  10 │ structs     sha256     4          643.676 μs  2.020 ms     3.1386
  11 │ tuples      sha256     4          643.640 μs  1.763 ms     2.73959
  12 │ dataframes  sha256     4          653.198 μs  1.106 ms     1.69289
  13 │ numbers     sha256     4          321.244 μs  513.798 μs   1.5994
  14 │ symbols     sha256     4          2.443 ms    3.603 ms     1.47466
  15 │ missings    sha256     4          666.797 μs  748.100 μs   1.12193
  16 │ strings     sha256     4          2.584 ms    2.129 ms     0.823794

After

16×6 DataFrame
 Row │ benchmark   hash       version    base        trait       ratio
     │ SubStrin   SubStrin  SubStrin  String      String      Float64
─────┼─────────────────────────────────────────────────────────────────────
   1 │ dicts       crc        4          1.594 ms    110.488 ms  69.3065
   2 │ tuples      crc        4          16.445 μs   513.070 μs  31.1991
   3 │ structs     crc        4          17.372 μs   515.770 μs  29.6897
   4 │ numbers     crc        4          7.002 μs    190.778 μs  27.2462
   5 │ dataframes  crc        4          25.515 μs   394.936 μs  15.4786
   6 │ symbols     crc        4          1.125 ms    1.736 ms     1.54343
   7 │ missings    crc        4          307.998 μs  325.848 μs   1.05795
   8 │ strings     crc        4          1.224 ms    364.596 μs   0.297855
   9 │ dicts       sha256     4          2.310 ms    144.709 ms  62.6373
  10 │ tuples      sha256     4          615.085 μs  1.818 ms     2.95518
  11 │ structs     sha256     4          614.753 μs  1.777 ms     2.8914
  12 │ numbers     sha256     4          306.402 μs  507.110 μs   1.65505
  13 │ dataframes  sha256     4          654.525 μs  1.082 ms     1.65288
  14 │ symbols     sha256     4          2.394 ms    3.510 ms     1.46617
  15 │ missings    sha256     4          680.580 μs  691.508 μs   1.01606
  16 │ strings     sha256     4          2.528 ms    2.163 ms     0.855819

@haberdashPI
Copy link
Copy Markdown
Member

First of all. SO sorry for the delayed reply. I am crawling out from under a huge pile of github notifications, and recovering from a very long winter with many deadlines and sick children, and I JUST saw your post from November 😬

Thank you for all of your PR contributions to this repository. It is goint to take me some time to look over them and understand, and I'm not exactly out of the woods with some pending deadlines, so it may take me some time, but I just wanted to let you know that I have finally noticed that you posted these PRs ages ago, and that I am no looking into them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants