Skip to content

ShuffleComputeState::copy_over_next_ss does not copy over global state #2648

Description

@phdavis1027

Draft PR #2554 is currently failing because it computes total_personalization once in init_steps. Then, in subsequent steps, it global_reads that sum. However, when ss is odd, and only when ss is odd, the expect panics. To illustrate, right before the failing read, I observed in LLDB that this was the state of total_personalization's index in the global state:

VecArray<double> {
  odd = size=0
  even = size=1 {
    [0] = 1
  }
  zero = 0
}

So the recorded total is still present, but at ss == 1 we try to read from odd, which is empty.
The "culprit" seems to be this line of code, which copies node-local states into the next superstep, but does not copy global into the next superstep. I quoted "culprit" because obviously this is rather complex machinery and I definitely want to leave room for the possibility that I just don't understand why this is intended behavior.

Assuming this is a bug, I tried to figure out why it hadn't surfaced before. Indeed, I'm still not really sure. It seems that the exact pattern of "read this global_agg every step" hasn't quite occurred, except in HITS, where it occurs with a counter that is also made into a global_agg_reset, which is not affected by this behavior. Moreover, finalize can call global_read e.g., in temporal_three_node_motifs, but the number of supersteps in that case is (as far as I can tell?) always the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions