Skip to content

Conversation

@Raneet10
Copy link
Member

@Raneet10 Raneet10 commented Nov 6, 2025

Description

This PR enables prefetching of trie nodes before execution of block. It adds a waitforwarm config which if enabled will block execution of block until the warming finishes.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

@Raneet10 Raneet10 requested a review from a team November 6, 2025 10:03
header := pool.currentHead.Load()
pool.mu.RUnlock()

if header == nil || len(txs) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move the txs len check on top of the function?
Just an optimization to reduce lock contention. Since txs is an input and doesn’t depend on pool state, you can fast-return before touching any shared data.

storageCacheHitMeter = metrics.NewRegisteredMeter("chain/storage/reads/cache/process/hit", nil)
storageCacheMissMeter = metrics.NewRegisteredMeter("chain/storage/reads/cache/process/miss", nil)

accountCacheHitPrefetchMeter = metrics.NewRegisteredMeter("chain/account/reads/cache/prefetch/hit", nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we remove all the unused metrics here? Lint's complaining

defer cancel()

if followupInterrupt == nil {
followupInterrupt = &atomic.Bool{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is linter right here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I didn't make this change.

vmCfg := bc.cfg.VmConfig
vmCfg.Tracer = nil
bc.prefetcher.Prefetch(block, throwaway, vmCfg, followupInterrupt)
// go func(start time.Time, throwaway *state.StateDB, block *types.Block) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this worth deleting?
Or - alternatively - have a comment to explain why it's disabled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented it out so that we don't run multiple prefetches together. I'll probably remove this once we finalize all the optimizations.

return nil, 0
}
vmCfg := bc.cfg.VmConfig
vmCfg.Tracer = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this required? Same for PrefetchFromTxpool

// Append cache hit/miss percentages
if bc != nil {
if total := bc.lastProcAccHit + bc.lastProcAccMiss; total > 0 {
pct := int64(100 * bc.lastProcAccHit / total)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this (and others in the same func) conversion is pleonastic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah right. I don't think these will make it to develop anyway.

go.mod Outdated

// Note: Change the go image version in Dockerfile if you change this.
go 1.24.6
go 1.24.8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth updating in the CI yml fles as well.
Also, since we're bumping, maybe we can use 1.24.9 to solve the govulncheck issues (I have that on my upstream PR as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do yeah but anyway this isn't getting merged to develop just yet.

core/analyzer.sh Outdated
@@ -0,0 +1,102 @@
# ./log_analyzer.sh "3 days ago" # Scans the last 3 days
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be excluded from this PR I guess?

Comment on lines 114 to 123
// Aggregate touched pre-state from stateCpy and populate the process reader cache directly
acc, stor := stateCpy.PrefetchTouched()
for a := range acc {
reader.Account(a)
}
for a, bucket := range stor {
for k := range bucket {
reader.Storage(a, k)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale of this change? IntermediateRoot also fetches touched account/storages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it also calculates state root which we strictly don't need. Pre-execution should fill up the accounts/storages. Do you think it still should be there ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be there. Simply loading account/storage isn't sufficient, because in order to calculate the final state root hash, some sibling nodes will need to be retrieved, even though they aren't related to touched accounts.

vmCfg.Tracer = nil
synthetic := types.NewBlockWithHeader(header).WithBody(types.Body{Transactions: txs})
var noop atomic.Bool
bc.prefetcher.Prefetch(synthetic, warmdb, vmCfg, &noop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call execute transactions without taking fees from transactions, which introduces a DoS vector. Malicious users can submit resource-expensive transaction and replace them with a bit higher gas continuously without paying the gas fee.

For example, if the current base gas price is 50Gwei, I can submit a transaction with 32M gas of SLOAD, with a initial gas price of 1Gwei, and continuously replace the existing tx by bumping up the gas by 10% for 41 times before reaching the 50Gwei base price (1Gwei * 1.1^41 = 49Gwei). This will result in the node to fully execute this transaction 41 times before the transaction is actually included in a block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Though I am planning to remove PrefetchFromTxpool altogether as I didn't see any impact of prefetching from txpool. Perhaps running it in worker (which is present currently) is more deterministic ?

@codecov
Copy link

codecov bot commented Nov 7, 2025

Codecov Report

❌ Patch coverage is 49.66667% with 151 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (src-optimizations@a63c799). Learn more about missing BASE report.

Files with missing lines Patch % Lines
core/blockchain.go 49.45% 39 Missing and 7 partials ⚠️
core/state/reader.go 2.70% 36 Missing ⚠️
core/state/statedb.go 0.00% 22 Missing ⚠️
core/blockchain_insert.go 57.50% 13 Missing and 4 partials ⚠️
miner/worker.go 63.41% 10 Missing and 5 partials ⚠️
core/state_prefetcher.go 86.53% 7 Missing ⚠️
eth/ethconfig/gen_config.go 0.00% 6 Missing ⚠️
eth/backend.go 0.00% 1 Missing ⚠️
miner/fake_miner.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                 Coverage Diff                  @@
##             src-optimizations    #1871   +/-   ##
====================================================
  Coverage                     ?   47.92%           
====================================================
  Files                        ?      844           
  Lines                        ?   143880           
  Branches                     ?        0           
====================================================
  Hits                         ?    68959           
  Misses                       ?    70402           
  Partials                     ?     4519           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sonarqubecloud
Copy link


// SeedCaches inserts the supplied accounts and storage slots directly into the
// local caches.
func (r *ReaderWithCacheStats) SeedCaches(accounts map[common.Address]*types.StateAccount, storage map[common.Address]map[common.Hash]common.Hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know/ensure the data provided by seeded cache is always correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the aim is to warm the cache with unmodified account/storage slots that are touched during the tx. I presume the data should be correct for most cases except maybe if there's a tx dependency.
During testing on mainnet ndoe, however, I never saw a failure due to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand the lifecycle of the cache? For example, when is value written, read, or deleted? Do they all happen for the same block? What if a new block read into an old value that should've been updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sorry for not clarifying earlier. So basically:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

miner/worker.go Outdated
return err
// start warm reader cache
tasks, _ := core.NewWarmExecTasks(w.chain, w.chainConfig, env.header, env.state, env.coinbase, txs, vm.Config{})
numProcs := max(1, 4*runtime.NumCPU()/5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this calculation derived?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was the same value being used in state prefetcher from upstream.

blockCtx := NewEVMBlockContext(header, bc, &coinbase)
shouldDelayFeeCal := false

for i, tx := range txs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be large amount of transactions in txs, which may cause unnecessary warm up. For example, if there are 10k pending txs, but only 20% of them will be included in the next block, 80% of compute would be wasted. One way of limiting is to include transactions until the sum of their gas limit reaches the block gas limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but before the callsite we are filtering the txs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. On the other hand, with this filter, we might not be able to fill the block. For example, if a malicious user submit 2 transactions with high gas price and each transaction has gas limit of 30M, we will only mine these two transactions. However, if each transactions actually consumes only 1M gas, the block will be only filled with 2M gas in total.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct but I think the only way to accurately determine the gas used is to execute the txs and this is a time consuming process itself.

lucca30 and others added 4 commits December 16, 2025 14:47
* bump version

* stable version

* V2.5.6 candidate (#1941)

* Revert recorded write operations when tx fails

* Revert selfdestruct on failure

* Added test for selfdestruct revert write

* Add HasSelfDestructed check when applying to final db

* p2p: added functionality to temporarily ban peers to prevent connections (#1922)

* p2p: added functionality to temporarily ban peers to prevent connections

* p2p: cleanup

* chore: bump kurtosis-pos (#1923)

* chore: bump kurtosis-pos

* chore: nits

* chore: install go

* fix(miner): delete pending task after successful block sealing to prevent node stall (#1929)

* fix(miner): delete pending task after successful block sealing to prevent node stall

* chore: nits

* chore: set staleThreshold to 0

* chore: update comment

* Remove Recommit and Fix Parent Actual Time track on Prepare (#1938)

* remove recommit and track parent actual time

* applying pr comments

* fix actual time cache store place

* chore: update deps (#1939)

chore: update deps

* bump version

* bump version

* stable version

---------

Co-authored-by: Angel Valkov <[email protected]>
Co-authored-by: Marcello Ardizzone <[email protected]>
Co-authored-by: Pratik Patil <[email protected]>
Co-authored-by: Krishang <[email protected]>
Co-authored-by: kamuikatsurgi <[email protected]>

* Unstuck

* Added block range check

* bump version 2.5.7

* bump to 2.5.6-beta3

---------

Co-authored-by: Lucca Martins <[email protected]>
Co-authored-by: Marcello Ardizzone <[email protected]>
Co-authored-by: Pratik Patil <[email protected]>
Co-authored-by: Krishang <[email protected]>
Co-authored-by: kamuikatsurgi <[email protected]>
…block building work (#1943)

* bump version

* stable version

* V2.5.6 candidate (#1941)

* Revert recorded write operations when tx fails

* Revert selfdestruct on failure

* Added test for selfdestruct revert write

* Add HasSelfDestructed check when applying to final db

* p2p: added functionality to temporarily ban peers to prevent connections (#1922)

* p2p: added functionality to temporarily ban peers to prevent connections

* p2p: cleanup

* chore: bump kurtosis-pos (#1923)

* chore: bump kurtosis-pos

* chore: nits

* chore: install go

* fix(miner): delete pending task after successful block sealing to prevent node stall (#1929)

* fix(miner): delete pending task after successful block sealing to prevent node stall

* chore: nits

* chore: set staleThreshold to 0

* chore: update comment

* Remove Recommit and Fix Parent Actual Time track on Prepare (#1938)

* remove recommit and track parent actual time

* applying pr comments

* fix actual time cache store place

* chore: update deps (#1939)

chore: update deps

* bump version

* bump version

* stable version

---------

Co-authored-by: Angel Valkov <[email protected]>
Co-authored-by: Marcello Ardizzone <[email protected]>
Co-authored-by: Pratik Patil <[email protected]>
Co-authored-by: Krishang <[email protected]>
Co-authored-by: kamuikatsurgi <[email protected]>

* fix: add pendingWorkBlock to prevent duplicate work

* fix: if check

* nit: comment

* chore: bump version

* chore: add pending work check in new head event as well

* chore: read veblop timeout

---------

Co-authored-by: Lucca Martins <[email protected]>
Co-authored-by: Angel Valkov <[email protected]>
Co-authored-by: Marcello Ardizzone <[email protected]>
Co-authored-by: Pratik Patil <[email protected]>
@github-actions
Copy link

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 26, 2025
@Raneet10 Raneet10 removed the Stale label Dec 26, 2025
@Raneet10 Raneet10 changed the base branch from src-optimizations to develop December 30, 2025 06:56
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants