Use independent seeds in makeSizedByteStrings#7735
Conversation
`makeSizedByteStrings` used the same `H.Seed` for every element, so each generated ByteString was a prefix of the same deterministic byte sequence. Use `unfoldr (Just . Seed.split)` to produce a stream of independent SplitMix seeds instead, giving uncorrelated content across sizes.
|
/benchmark nofib |
|
Click here to check the status of your benchmark. |
zliu41
left a comment
There was a problem hiding this comment.
Use independent seeds
Looks reasonable, but are you sure this is the reason for the "FIXME: this is terrible"? Who added the FIXME?
This has nothing to do with |
|
Comparing benchmark results of 'nofib' on '0468c1c57d' (base) and '5a6c9917d0' (PR) Results table
|
Unisay
left a comment
There was a problem hiding this comment.
Two suggestions, neither blocking:
-
Could
seedsbe a top-level binding?unfoldr (Just . Seed.split)is enough of a head-scratcher that I'd give it a name, and the same stream is useful for the next sized-list helper:-- | Infinite stream of independent seeds derived from a root seed. splitSeeds :: H.Seed -> [H.Seed] splitSeeds = unfoldr (Just . Seed.split)
-
The same bug is in
makeSizedTextStrings(line 108) andmakeSizedUtf8ByteStrings(line 118). Bothfmapthe same seed across all sizes, so string anddecodeUtf8benchmarks get correlated inputs too. WithsplitSeedsextracted, each fix is one line:makeSizedTextStrings :: H.Seed -> [Integer] -> [Text] makeSizedTextStrings seed sizes = zipWith makeSizedTextString (splitSeeds seed) (fmap fromInteger sizes) makeSizedUtf8ByteStrings :: H.Seed -> [Integer] -> [ByteString] makeSizedUtf8ByteStrings seed sizes = zipWith makeSizedUtf8ByteString (splitSeeds seed) (fmap fromInteger sizes)
If you'd rather keep this PR scoped, I'm happy to do these in a follow-up.
`makeSizedTextStrings` and `makeSizedUtf8ByteStrings` had the same bug as `makeSizedByteStrings`: they fmap'd a single seed across all sizes, producing correlated string and decodeUtf8 benchmark inputs. Extract the seed stream into a named top-level binding `splitSeeds` and reuse it across all three sized-list helpers.
Summary
Fixes the
-- FIXME: this is terribleinplutus-core/cost-model/budgeting-bench/Generators.hs.makeSizedByteStringspassed the sameH.Seedto every call ofmakeSizedByteString, so each generated ByteString was a prefix of the same deterministic byte sequence — smaller ones were literal prefixes of larger ones. This produced correlated inputs for benchmarks that depend on byte content (equality, hashing, string ops).