Skip to content

Parallel transaction execution with BALs#9182

Open
Marchhill wants to merge 762 commits intomasterfrom
feature/parallel-txs
Open

Parallel transaction execution with BALs#9182
Marchhill wants to merge 762 commits intomasterfrom
feature/parallel-txs

Conversation

@Marchhill
Copy link
Copy Markdown
Contributor

@Marchhill Marchhill commented Aug 20, 2025

Depends on BALs #9114

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

@Marchhill Marchhill mentioned this pull request Aug 21, 2025
16 tasks
@Marchhill Marchhill mentioned this pull request Oct 7, 2025
16 tasks
@Marchhill Marchhill force-pushed the feature/parallel-txs branch from a84593b to 692e22b Compare November 13, 2025 14:12
@Marchhill Marchhill changed the base branch from feature/block-level-access-lists to bal-devnet-1 December 16, 2025 16:33
@Marchhill Marchhill changed the base branch from bal-devnet-1 to bal-devnet-2-bals-only January 30, 2026 11:24
@flcl42 flcl42 force-pushed the bal-devnet-2-bals-only branch 6 times, most recently from 76a8434 to de48b8c Compare February 24, 2026 10:39
@Marchhill Marchhill changed the base branch from bal-devnet-2-bals-only to bal-devnet-2 March 2, 2026 19:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

EVM Opcode Benchmark Diff

Aggregated runs: base=1, pr=1

No significant regressions or improvements detected.

Base automatically changed from bal-devnet-2 to master March 9, 2026 17:55
@LukaszRozmej LukaszRozmej mentioned this pull request Apr 17, 2026
12 tasks
Marchhill and others added 4 commits April 17, 2026 10:12
* Move cumulative gas tracking from tracer to TransactionProcessor

Track cumulative regular and receipt gas as fields on
TransactionProcessorBase, removing the IBlockGasAccountingTracer cast
from ValidateGas. This decouples transaction validation from the
tracing infrastructure.

- Add _blockCumulativeRegularGas and _blockCumulativeReceiptGas fields
- Reset in SetBlockExecutionContext, update after each validated tx
- Remove ITxTracer parameter from ValidateGas and ValidateStatic
- Update SystemTransactionProcessor, Taiko, and Xdc overrides

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix

* cleanup

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>
@Marchhill Marchhill mentioned this pull request Apr 17, 2026
16 tasks
@github-actions github-actions bot removed the devops label Apr 17, 2026
@Marchhill
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Apr 17, 2026

Claude finished @Marchhill's task in 4m 16s —— View job


Code Review: Parallel Transaction Execution with BALs

  • Load repository rules
  • Read key changed files
  • Review open issues from previous reviews
  • Check for new issues
  • Post final review

Status Since Last Re-Review (April 16 01:37)

Good progress. Several items from the previous review are confirmed fixed:

  • HasNoChanges now uses c.HasSlotChanges bool field — no enumerator allocation ✓
  • GetAccountChanges in BlockAccessListBasedWorldState uses explicit foreach
  • DecrementNonce is a no-op in BlockAccessListBasedWorldState (asymmetry resolved) ✓
  • SlotChanges.Get now uses stackalloc for the temp 32-byte buffer (one less allocation) ✓
  • HasSlotChangesAtIndex(int) bool method added to AccountChanges
  • TracedAccessWorldState balance/nonce/code/storage getters use direct indexed access ✓
  • ApplyStateChanges uses direct indexed access for balance/code/nonce ✓

Still Open

High — Hot-path allocations

1. Enumerable.SequenceEqual on ChangeAtIndex.SlotChanges — still not fixed

BlockAccessListManager.cs:308

!Enumerable.SequenceEqual(generatedHead.Value.SlotChanges, suggestedHead.Value.SlotChanges)

Both sides are IEnumerable<SlotChanges> from ChangeAtIndex.SlotChanges which holds the result of AccountChanges.SlotChangesAtIndex(index) — a yield return iterator method. Each call to .GetEnumerator() via SequenceEqual allocates a heap state machine. This runs per matched account per ValidateBlockAccessList call (called once per tx).

2. GetChangesAtIndex allocates a slot-change iterator per account unconditionally — still not fixed

BlockAccessList.cs:373

accountChanges.SlotChangesAtIndex(index),

SlotChangesAtIndex is a yield return method — calling it allocates a state machine object. This is stored in ChangeAtIndex.SlotChanges and happens for every account in every GetChangesAtIndex call, even for accounts where HasNoChanges is true (so the iterator is never iterated, just allocated and discarded). GetChangesAtIndex is called twice per tx for both the generated and suggested BALs. For a block with 100 accounts and 300 txs, that's ~60,000 heap objects just from this path.

Issues #1 and #2 are coupled. The root fix: remove IEnumerable<SlotChanges> SlotChanges from ChangeAtIndex entirely and replace Enumerable.SequenceEqual with a direct comparison method that takes two AccountChanges objects and an index, comparing slot changes without any yield-return iterator. Fix this →


Medium

3. _lastLoadedBal still present — was reported as fixed in previous review but is not

BlockAccessListManager.cs:57, 80–85

private Hash256 _lastLoadedBal = Hash256.Zero;
...
if (ParallelExecutionEnabled && suggestedBlock.Hash != _lastLoadedBal)
{
    _lastLoadedBal = suggestedBlock.Hash;
    LoadPreStateToSuggestedBlockAccessList(suggestedBlock.BlockAccessList);
}

BlockAccessListManager is registered AddScoped (BlockProcessingModule.cs:67). A new instance is created per DI scope. If each scope corresponds to one block, _lastLoadedBal starts as Hash256.Zero on every call and the guard always passes — the cache never hits. Either:

  • Remove the field and guard, and always call LoadPreStateToSuggestedBlockAccessList; or
  • Document the intended lifetime if PrepareForProcessing can be called multiple times within the same scope (e.g. for the same block under retry), and add a comment explaining why the cache is valid. Fix this →

4. N VirtualMachine + TxProcessorWithWorldState allocations per block — still present

BlockAccessListManager.cs:429–437// todo: could be a lot of allocations here comment is still there. ParallelTxProcessorWithWorldStateManager.Setup allocates len + 2 TxProcessorWithWorldState objects (each owning a new VirtualMachine) per block.

5. IBlockAccessListManager God object — still not split

// todo: maybe split into smaller classes still in the interface. 12 methods across tx processing, BAL tracking, system contracts, gas validation, AuRa preprocessing. This should be resolved before landing in master.

6. GetNonce/GetBalance O(N) scan — // todo: binary search still present

AccountChanges.cs:272–298 — Both GetNonce and GetBalance iterate _nonceChanges/_balanceChanges linearly. For accounts touched many times in a block, each lookup scans all prior changes. Binary search on SortedList keys is straightforward:

// Find largest key < blockAccessIndex using binary search
int pos = _balanceChanges.Keys.BinarySearch(blockAccessIndex);
int idx = (pos < 0 ? ~pos : pos) - 1;
return idx >= 0 ? _balanceChanges.Values[idx].PostBalance : null;

Fix this →


Minor

7. SlotChanges.Get still allocates one byte[] per storage read

SlotChanges.cs:68–75

lastValue.ToBigEndian(tmp);
return [.. tmp.WithoutLeadingZeros()]; // still allocates

The stackalloc is good — one allocation down from two. WithoutLeadingZeros() on a Span returns a Span (no allocation), but [.. span] creates a new byte[]. Every SLOAD in a parallel tx goes through this. Callers wanting ReadOnlySpan<byte> could use the stack buffer directly without materialising to an array.

8. Restore method inconsistency — change.BlockAccessIndex vs Index

BlockAccessList.cs:317, 326, 336, 345 — The balance restore uses change.BlockAccessIndex, while code/nonce/storage restores use Index (the block-scoped BlockAccessList.Index field). These produce the same result since restores only happen within the current tx, but the inconsistency is a latent trap. Standardise to change.BlockAccessIndex throughout. Fix this →

9. AccountChanges.Equals and SlotChanges.Equals use LINQ SequenceEqual

AccountChanges.cs:88–92, SlotChanges.cs:22 — Not in hot paths, but these violate the no-LINQ rule and could be replaced with explicit loops or array comparison helpers.

10. Both processor managers allocated unconditionally

BlockAccessListManager.cs:51–52_parallelTxProcessorWithWorldStateManager and _sequentialTxProcessorWithWorldStateManager are both allocated on every BlockAccessListManager construction. For sequential-only workloads (block production, non-BAL blocks), the parallel manager is wasted.


Overall Assessment

The PR continues to improve. The DecrementNonce asymmetry fix, GetAccountChanges LINQ removal, and HasNoChanges bool-field fix from the previous round are all in. The two remaining hot-path allocation issues (#1 and #2 above — the ChangeAtIndex.SlotChanges iterator chain) are the most actionable items before merge. They're also the most structurally impactful, since fixing them removes the only remaining yield return allocation in ValidateBlockAccessList. IBlockAccessListManager split and the _lastLoadedBal dead-code cleanup are the other pre-merge blockers.

@Marchhill
Copy link
Copy Markdown
Contributor Author

@claude since all open comments will be resolved in future todos the PR is ready for human review

@claude
Copy link
Copy Markdown

claude bot commented Apr 17, 2026

Claude finished @Marchhill's task in 2m 4s —— View job


PR Readiness Summary for Human Review

  • Load repository rules
  • Read current state of key files
  • Assess remaining open issues vs. acknowledged todos
  • Post final summary

Marc has confirmed that outstanding items will be tracked as future todos. Here is the current state of what remains open, to help human reviewers calibrate:


Still Open (Tracked as Future Todos)

Hot-path allocations in ValidateBlockAccessList (called once per tx during parallel execution):

  • Enumerable.SequenceEqual on ChangeAtIndex.SlotChanges (BlockAccessListManager.cs:309) — both sides come from yield return iterator methods; allocates two enumerator state machines per matched account per tx. Not yet fixed.
  • ChangeAtIndex.SlotChanges stores IEnumerable<SlotChanges> (BlockAccessList.cs:373, 555)SlotChangesAtIndex(index) is still called per account in GetChangesAtIndex, allocating a yield-return state machine per account even when HasNoChanges would short-circuit. Fix requires removing IEnumerable<SlotChanges> SlotChanges from ChangeAtIndex and replacing SequenceEqual with a direct AccountChanges-level comparison.

GetNonce/GetBalance O(N) linear scan (AccountChanges.cs:272, 287)// todo: binary search comment still present; both methods iterate a SortedList linearly. For accounts touched many times in a block, this is O(N) per lookup.

N VirtualMachine allocations per block (BlockAccessListManager.cs:~435)// todo: could be a lot of allocations here still present; len + 2 TxProcessorWithWorldState objects (each owning a fresh VirtualMachine) are created per block.

IBlockAccessListManager God object (IBlockAccessListManager.cs)// todo: maybe split into smaller classes still in the interface; 12 methods spanning tx processing, BAL tracking, system contracts, gas validation, and AuRa preprocessing.

_lastLoadedBal caching guard (BlockAccessListManager.cs:57, 81–85) — The caching logic (suggestedBlock.Hash != _lastLoadedBal) could be dead code if each DI scope maps to exactly one block (which is the AddScoped expectation at BlockProcessingModule.cs:67), but may be intentional if PrepareForProcessing can be called multiple times within the same scope. Worth clarifying or documenting.


What Has Been Fixed Through Multiple Review Rounds

  • Console.WriteLine debug statement removed ✓
  • BlockGasUsedTestBlockGasUsed with proper docs ✓
  • ParallelBlockValidationTransactionsExecutor separated into its own class ✓
  • Race condition on ApplyStateChanges architecturally resolved (parallel txs read from BAL) ✓
  • ParallelExecution / ParallelExecutionBatchRead default to false
  • CancellationToken checked in IncrementalValidation
  • HasNoChanges uses c.HasSlotChanges bool field (no enumerator allocation) ✓
  • LINQ removed from AccountChanges.PopChange<T>, SlotChanges.TryPopStorageChange, TracedAccessWorldState getters ✓
  • UInt256.MaxValue sentinel replaced with UInt256? nullable returns ✓
  • GetCodeHash no longer re-hashes code bytes ✓
  • DecrementNonce is a no-op in BlockAccessListBasedWorldState (symmetric with IncrementNonce) ✓
  • GetAccountChanges uses explicit foreach instead of LINQ ✓
  • ApplyStateChanges uses direct indexed access for balance/code/nonce ✓
  • _lastLoadedBal field has a plausible caching justification (though worth clarifying) ✓
  • BlockAccessListBasedWorldStateTests (196 lines) + TracedAccessWorldStateTests (552 lines) added ✓
  • ParallelEngineModuleTests + ParallelAuRaMergeEngineModuleTests added (re-run engine module suite with parallel=true) ✓

Overall Assessment

The PR is in materially better shape than at the first review — the correctness blockers are resolved, parallel execution defaults to off, and there is meaningful test coverage through EngineModuleTestsV6 and the new parallel-mode fixtures. The remaining items are primarily performance optimisations (the two hot-path allocation issues in ValidateBlockAccessList being the most impactful) and the IBlockAccessListManager split. These are appropriate to track as todos given the scope of changes already made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants