Skip to content

Use independent seeds in makeSizedByteStrings#7735

Open
zeme-wana wants to merge 2 commits intomasterfrom
affectionate-kowalevski-0a435e
Open

Use independent seeds in makeSizedByteStrings#7735
zeme-wana wants to merge 2 commits intomasterfrom
affectionate-kowalevski-0a435e

Conversation

@zeme-wana
Copy link
Copy Markdown
Collaborator

@zeme-wana zeme-wana commented Apr 23, 2026

Summary

Fixes the -- FIXME: this is terrible in plutus-core/cost-model/budgeting-bench/Generators.hs.

makeSizedByteStrings passed the same H.Seed to every call of makeSizedByteString, so each generated ByteString was a prefix of the same deterministic byte sequence — smaller ones were literal prefixes of larger ones. This produced correlated inputs for benchmarks that depend on byte content (equality, hashing, string ops).

`makeSizedByteStrings` used the same `H.Seed` for every element, so each
generated ByteString was a prefix of the same deterministic byte sequence.
Use `unfoldr (Just . Seed.split)` to produce a stream of independent
SplitMix seeds instead, giving uncorrelated content across sizes.
@zeme-wana zeme-wana added the No Changelog Required Add this to skip the Changelog Check label Apr 23, 2026
@zeme-wana
Copy link
Copy Markdown
Collaborator Author

/benchmark nofib

@github-actions
Copy link
Copy Markdown
Contributor

Click here to check the status of your benchmark.

Copy link
Copy Markdown
Member

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use independent seeds

Looks reasonable, but are you sure this is the reason for the "FIXME: this is terrible"? Who added the FIXME?

@zliu41
Copy link
Copy Markdown
Member

zliu41 commented Apr 23, 2026

/benchmark nofib

This has nothing to do with nofib. It updates the costing benchmark.

@github-actions
Copy link
Copy Markdown
Contributor

Comparing benchmark results of 'nofib' on '0468c1c57d' (base) and '5a6c9917d0' (PR)

Results table
Script 0468c1c 5a6c991 Change
clausify/formula1 2.029 ms 2.042 ms +0.6%
clausify/formula2 2.738 ms 2.759 ms +0.8%
clausify/formula3 7.494 ms 7.544 ms +0.7%
clausify/formula4 16.46 ms 16.56 ms +0.6%
clausify/formula5 36.38 ms 36.55 ms +0.5%
knights/4x4 12.06 ms 12.14 ms +0.7%
knights/6x6 30.13 ms 30.38 ms +0.8%
knights/8x8 52.74 ms 53.15 ms +0.8%
primetest/05digits 5.752 ms 5.793 ms +0.7%
primetest/10digits 11.39 ms 11.47 ms +0.7%
primetest/30digits 33.03 ms 33.43 ms +1.2%
primetest/50digits 53.91 ms 54.76 ms +1.6%
queens4x4/bt 3.569 ms 3.590 ms +0.6%
queens4x4/bm 4.668 ms 4.700 ms +0.7%
queens4x4/bjbt1 4.340 ms 4.372 ms +0.7%
queens4x4/bjbt2 4.067 ms 4.092 ms +0.6%
queens4x4/fc 9.196 ms 9.239 ms +0.5%
queens5x5/bt 48.81 ms 48.96 ms +0.3%
queens5x5/bm 54.42 ms 54.49 ms +0.1%
queens5x5/bjbt1 57.86 ms 58.05 ms +0.3%
queens5x5/bjbt2 55.90 ms 56.07 ms +0.3%
queens5x5/fc 115.8 ms 116.5 ms +0.6%
0468c1c 5a6c991 Change
TOTAL 622.7 ms 626.6 ms +0.6%

Copy link
Copy Markdown
Contributor

@Unisay Unisay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions, neither blocking:

  1. Could seeds be a top-level binding? unfoldr (Just . Seed.split) is enough of a head-scratcher that I'd give it a name, and the same stream is useful for the next sized-list helper:

    -- | Infinite stream of independent seeds derived from a root seed.
    splitSeeds :: H.Seed -> [H.Seed]
    splitSeeds = unfoldr (Just . Seed.split)
  2. The same bug is in makeSizedTextStrings (line 108) and makeSizedUtf8ByteStrings (line 118). Both fmap the same seed across all sizes, so string and decodeUtf8 benchmarks get correlated inputs too. With splitSeeds extracted, each fix is one line:

    makeSizedTextStrings :: H.Seed -> [Integer] -> [Text]
    makeSizedTextStrings seed sizes =
      zipWith makeSizedTextString (splitSeeds seed) (fmap fromInteger sizes)
    
    makeSizedUtf8ByteStrings :: H.Seed -> [Integer] -> [ByteString]
    makeSizedUtf8ByteStrings seed sizes =
      zipWith makeSizedUtf8ByteString (splitSeeds seed) (fmap fromInteger sizes)

If you'd rather keep this PR scoped, I'm happy to do these in a follow-up.

`makeSizedTextStrings` and `makeSizedUtf8ByteStrings` had the same bug as
`makeSizedByteStrings`: they fmap'd a single seed across all sizes,
producing correlated string and decodeUtf8 benchmark inputs.

Extract the seed stream into a named top-level binding `splitSeeds` and
reuse it across all three sized-list helpers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

No Changelog Required Add this to skip the Changelog Check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants