Skip to content

Journal snapshot ID overflow can cause incorrect revert behavior #841

@roninjin10

Description

@roninjin10

Summary

The journal's snapshot ID generation uses saturating addition, which means once the ID reaches the maximum value, all new snapshots will receive the same ID. This can cause incorrect revert behavior in long-running sessions or under certain attack scenarios.

Priority: MEDIUM

This is a correctness issue that could manifest in edge cases or under adversarial conditions.

Location

File: src/storage/journal.zig
Lines: 59-65

Current Code

/// Create a new snapshot and return its ID
pub fn create_snapshot(self: *Self) SnapshotIdType {
    const id = self.next_snapshot_id;
    // Use saturating addition to prevent overflow
    // If we hit max snapshots, reuse the max ID (will still work for revert)
    self.next_snapshot_id = @min(self.next_snapshot_id +| 1, std.math.maxInt(SnapshotIdType));
    return id;
}

Problem

When next_snapshot_id reaches maxInt(SnapshotIdType) (which is maxInt(u32) = 4,294,967,295 by default):

  1. All subsequent calls to create_snapshot() return the same ID
  2. Multiple nested calls will have indistinguishable snapshots
  3. revert_to_snapshot(snapshot_id) will revert ALL entries with that ID, not just the intended ones
  4. This breaks the isolation between nested call frames

Impact

  1. Consensus Failure Risk: In extreme cases, incorrect state reverts could cause consensus failures
  2. Long-Running Sessions: Development environments or persistent nodes could hit this limit
  3. Potential DoS Vector: An attacker could potentially craft transactions that rapidly create snapshots to reach the limit

Example Scenario

Transaction 1: Creates snapshots 4294967293, 4294967294, 4294967295
Transaction 2: Creates snapshots 4294967295, 4294967295, 4294967295 (all same!)

If nested call fails and reverts to "snapshot 4294967295", it will revert 
ALL entries from ALL three nested calls, not just the innermost one.

Required Fix

Option A: Detect and error on overflow

pub fn create_snapshot(self: *Self) !SnapshotIdType {
    if (self.next_snapshot_id == std.math.maxInt(SnapshotIdType)) {
        return error.SnapshotIdExhausted;
    }
    const id = self.next_snapshot_id;
    self.next_snapshot_id += 1;
    return id;
}

Option B: Use wrapping with epoch tracking

pub fn create_snapshot(self: *Self) SnapshotIdType {
    const id = self.next_snapshot_id;
    self.next_snapshot_id +%= 1;  // Wrapping addition
    // Also track an epoch counter to disambiguate wrapped IDs
    if (self.next_snapshot_id == 0) {
        self.epoch += 1;
    }
    return id;
}

Option C: Reset counter per transaction (Recommended)

Add a clear() method that resets the counter, called at the start of each transaction:

/// Clear all entries and reset snapshot counter
pub fn clear(self: *Self) void {
    self.entries.clearRetainingCapacity();
    self.next_snapshot_id = 0;  // Reset for new transaction
}

This is already implemented - ensure it's called at transaction boundaries.

Steps to Fix

  1. Open src/storage/journal.zig
  2. Analyze how clear() is called - verify it resets the counter per transaction
  3. If Option A: Change return type to error union and handle at call sites
  4. If Option C: Verify clear() is called at every transaction boundary in src/evm.zig
  5. Add a test that verifies behavior at snapshot ID boundaries
  6. Run zig build && zig build test to verify

Verification

Add this test:

test "Journal - snapshot ID near max value" {
    const testing = std.testing;
    
    var journal = try DefaultJournal.init(testing.allocator);
    defer journal.deinit();
    
    // Set next_snapshot_id near max
    journal.next_snapshot_id = std.math.maxInt(u32) - 2;
    
    const snap1 = journal.create_snapshot();
    const snap2 = journal.create_snapshot();
    const snap3 = journal.create_snapshot();
    const snap4 = journal.create_snapshot();
    
    // Verify IDs are still unique (or error is returned)
    try testing.expect(snap1 != snap2);
    try testing.expect(snap2 != snap3);
    // snap3 and snap4 will be same with current impl - this should fail or change
}

Context

The journal system tracks state changes for transaction rollback. Each nested CALL creates a new snapshot. While 4 billion snapshots seems like a lot, the saturating behavior means the system silently degrades rather than failing explicitly.

Related Code

  • src/evm.zig - Creates snapshots for each call
  • src/storage/database.zig - Also has snapshot IDs (verify consistency)

Note: This issue was created by Claude AI assistant during a code review, not by @roninjin10 or @fucory

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions