Skip to content

Fix eth/69 receipt encoding for Frontier tx#9900

Open
macfarla wants to merge 20 commits intobesu-eth:mainfrom
macfarla:test/snap-besu-to-besu
Open

Fix eth/69 receipt encoding for Frontier tx#9900
macfarla wants to merge 20 commits intobesu-eth:mainfrom
macfarla:test/snap-besu-to-besu

Conversation

@macfarla
Copy link
Contributor

@macfarla macfarla commented Feb 26, 2026

PR description

The Problem

Besu-to-Besu snap sync was failing for small networks. The CliqueToPoSTest acceptance test (snap sync section) would hang and never complete because receipt downloads kept failing with RECEIPTS_ROOT_MISMATCH.

Root Cause

A bug in SyncTransactionReceiptEncoder.encodeForRootCalculation() when using eth/69 protocol.

eth/69 sends receipts without bloom filters. The receiving node must re-encode them with bloom for receipt root calculation. The encoder checked if a receipt was Frontier type using:
TransactionType.FRONTIER.getSerializedType() // returns 0xf8
But eth/69's decoder stores Frontier type code as 0x00 (getEthSerializedType()). So Frontier receipts were never detected as Frontier, causing them to be incorrectly prefixed with 0x00 during encoding, producing a wrong receipt
trie root.

The Fix (SyncTransactionReceiptEncoder.java)

  Changed the isFrontier check to handle both representations:
  final boolean isFrontier =
      receipt.getTransactionTypeCode().isEmpty()
          || receipt.getTransactionTypeCode().get(0)
              == TransactionType.FRONTIER.getEthSerializedType()   // 0x00
          || receipt.getTransactionTypeCode().get(0)
              == TransactionType.FRONTIER.getSerializedType();     // 0xf8

Additional Fix (RequestManager.java)

Found and fixed an outstanding request capacity leak - both dispatchResponse() and close() could decrement the counter for the same request. Fixed with a CAS-based AtomicBoolean to ensure exactly-once release semantics.

Other Changes

  • ProcessBesuNodeRunner.java - Added --Xsynchronizer-pivot-distance CLI passthrough so acceptance tests can configure pivot distance
  • CliqueToPoSTest.java - The acceptance test with snap sync validation
  • RequestManagerTest.java - Unit tests for the CAS fix

Why It Was Hard to Find

  • The symptom (peer disconnections after ~5 second gaps) was misleading - it was a consequence of receipt validation failures exhausting retry limits, not a networking issue
  • Subprocess logs don't appear in gradle console output; they're captured in JUnit XML test result files
  • The TransactionType.FRONTIER has confusingly named methods: getSerializedType() returns 0xf8 (the typeValue), while getEthSerializedType() returns 0x00 (the serializedType)

Fixed Issue(s)

fixes #10027
note this PR also includes the rlp logging change from #10038 which was needed to diagnose #10044

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests
  • hive tests: Engine or other RPCs modified?

macfarla and others added 6 commits February 26, 2026 09:03
Cherry-pick all snap fixes (besu-eth#9876, besu-eth#9877, besu-eth#9855) and Fabio's
CliqueToPoSTest from besu-eth#9852 with the snap sync section uncommented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Copilot AI review requested due to automatic review settings February 26, 2026 04:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in eth/69 snap sync that caused receipt root mismatches and peer disconnections. The issue stemmed from incorrect Frontier transaction type detection during receipt encoding, which prevented small networks from completing snap sync.

Changes:

  • Fixed Frontier receipt detection logic in SyncTransactionReceiptEncoder to handle both 0x00 and 0xf8 type codes
  • Fixed request capacity leak in RequestManager using CAS-based exactly-once release semantics
  • Added acceptance test validating Clique-to-PoS transition with snap sync

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
SyncTransactionReceiptEncoder.java Fixed isFrontier check to handle both eth/69 (0x00) and standard (0xf8) Frontier type codes
RequestManager.java Added CAS-based logic to prevent double-decrement of outstanding request counter
RequestManagerTest.java Added unit tests for request capacity release scenarios
AccountRangeMessage.java Added slim encoding to wire format for account messages
AccountRangeMessageTest.java Added tests for slim encoding behavior
SnapServerTest.java Updated assertions after removing request fudge factor
CliqueToPoSTest.java New acceptance test validating snap sync in Clique-to-PoS transition
ProcessBesuNodeRunner.java Added CLI passthrough for pivot distance configuration
BesuNodeFactory.java Added ADMIN RPC API to default node configuration
GenesisConfigurationFactory.java Added utility method to create genesis from resource
clique_to_pos.json Genesis configuration for acceptance test
key Private key file for acceptance test
CHANGELOG.md Documented bug fixes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 35 to 40
final boolean isFrontier =
!receipt.getTransactionTypeCode().isEmpty()
&& receipt.getTransactionTypeCode().get(0)
receipt.getTransactionTypeCode().isEmpty()
|| receipt.getTransactionTypeCode().get(0)
== TransactionType.FRONTIER.getEthSerializedType()
|| receipt.getTransactionTypeCode().get(0)
== TransactionType.FRONTIER.getSerializedType();
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition checks the same receipt.getTransactionTypeCode().get(0) twice without null safety. If the collection is empty, get(0) will throw an exception in the second and third conditions. Consider restructuring: check if empty first (line 36), then if not empty, evaluate both type comparisons with a single get(0) call stored in a variable.

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +126
final String[] enableRpcApis = new String[] {ADMIN.name()};

final BesuNodeConfigurationBuilder builder =
new BesuNodeConfigurationBuilder()
.name(name)
.jsonRpcConfiguration(node.createJsonRpcWithRpcApiEnabledConfig(enableRpcApis));

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change always enables ADMIN RPC API for all nodes created via createNode(), which differs from the previous behavior where the caller had full control. This could expose admin operations on nodes that shouldn't have them. Consider making this configurable or documenting why ADMIN is now required.

Suggested change
final String[] enableRpcApis = new String[] {ADMIN.name()};
final BesuNodeConfigurationBuilder builder =
new BesuNodeConfigurationBuilder()
.name(name)
.jsonRpcConfiguration(node.createJsonRpcWithRpcApiEnabledConfig(enableRpcApis));
final BesuNodeConfigurationBuilder builder = new BesuNodeConfigurationBuilder().name(name);

Copilot uses AI. Check for mistakes.
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
@macfarla
Copy link
Contributor Author

SyncTransactionReceiptEncoder was introduced in #9621

macfarla and others added 5 commits February 27, 2026 12:00
This reverts commit cce004a.

Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
…ontier receipts

The decoder was storing Frontier type code as getSerializedType() (0xf8)
but the encoder checks getEthSerializedType() (0x00). When a peer sends
Frontier receipts with empty type bytes (e.g. reth), the decoder stored
0xf8, the encoder didn't recognize it as Frontier, and encoded it as a
typed receipt - producing wrong receipt roots and disconnecting the peer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
@macfarla
Copy link
Contributor Author

Tested on Hoodi testnet with a two-node setup (client + server), both running v26.3-develop.

Before (client without #9900, server with or without #9900)

The client repeatedly rejects receipts from the server as RESULTS_DO_NOT_MATCH_QUERY — the receipt trie root computed locally doesn't match the block header, due to incorrect handling of Frontier-type receipts in encodeForRootCalculation. After 5 failures the peer is disconnected as "useless", and the cycle repeats on reconnect.

Observed against both a 26.2.0 server and a 26.3-develop server with #9900:

  PeerTaskExecutor: Invalid response RESULTS_DO_NOT_MATCH_QUERY found for GetSyncReceiptsFromPeerTask from peer 0x13812e3c...
  PeerTaskExecutor: Executed peer task GetSyncReceiptsFromPeerTask against 0x13812e3c..., response code INVALID_RESPONSE, retries remaining 4
  PeerTaskExecutor: Invalid response RESULTS_DO_NOT_MATCH_QUERY found for GetSyncReceiptsFromPeerTask from peer 0x13812e3c...
  PeerReputation: Disconnection triggered by exceeding useless response threshold for peer 0x13812e3c...
  AbstractPeerConnection: Disconnecting connection -481815964, peer 0x13812e3c... reason 0x03 USELESS_PEER_USELESS_RESPONSES Useless responses: exceeded threshold
  EthPeers: Disconnected EthPeer 0x13812e3c...
  EthProtocolManager: Disconnect - active Connection? true - Outbound - 0x03 USELESS_PEER_USELESS_RESPONSES - besu/v26.2.0/linux-aarch_64/openjdk-java-21 - 0 peers left

The node cannot make sync progress — it disconnects every peer serving blocks with legacy transactions.

After (client with #9900)

The client correctly validates receipts, stays connected, and snap sync progresses steadily:

  PeerTaskExecutor: Executed peer task GetSyncReceiptsFromPeerTask against 0xbfae3dad..., response code SUCCESS, retries remaining 5
  SnapSyncMetricsManager: Worldstate download progress: 18.09%, Peer count: 1
  SnapSyncMetricsManager: Worldstate download progress: 19.83%, Peer count: 1
  SnapSyncMetricsManager: Worldstate download progress: 21.57%, Peer count: 1

No RESULTS_DO_NOT_MATCH_QUERY, no peer disconnections. Worldstate advancing ~1.7%/min with a single peer.

SyncTransactionReceiptEncoder.encodeForRootCalculation checked for
TransactionType.FRONTIER.getSerializedType() (0xf8) as the Frontier
marker, but eth/69 Frontier receipts decoded by SyncTransactionReceiptDecoder
store transactionTypeCode=0x00 (getEthSerializedType). The encoder
treated 0x00 as a non-Frontier typed transaction, prepending the type
byte and producing a wrong receipt trie root → RESULTS_DO_NOT_MATCH_QUERY
when syncing from a peer that sends eth/69 receipts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

snap - client disconnecting server for Invalid response RESULTS_DO_NOT_MATCH_QUERY

3 participants