You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Research and implement a compile-time code introspection tool that automatically generates a "fuzzer-esque" sync testing suite. This suite will automatically call engine functions in bulk with various auto-generated parameters to systematically catch cross-platform determinism and desync issues.
It is expected that coverage will not be 100% - this is intended to be another tool in the sync testing toolbox.
This may be very difficult or require very specialized knowledge. But worth researching to see if it's possible!
Context
Desyncs in the engine can originate from virtually anywhere in the codebase, making them notoriously difficult to predict and manually test for. Historically, desyncs have frequently been triggered highly edge case/esoteric places; in other words, we cant just test our float implementation and be done.
While we currently rely on seeded gameplay tests via fightertest (and maybe soon specific float precision tests #2908 ), manually writing comprehensive sync tests for every engine function is unscalable. Every time engine code signatures change we'd have to remember to add to the suite. By leveraging code introspection at compile time, we could automatically generate a test suite that fuzzes engine functions with a "kitchen sink" of parameters. This approach will allow us to test functions in bulk across different platforms (e.g., x86_64 vs ARM64) to ensure deterministic execution without requiring constant manual test maintenance.
The main challenge is how do we handle state; if state can't be injected easily or if important code is not written in a dependency-invertible way, we might struggle to test well.
Acceptance Criteria
A mechanism for compile-time code introspection is implemented to automatically identify engine functions that need sync testing.
An automated test generator is created to produce test cases that call these identified functions with various auto-generated parameters.
The generated test suite can be plugged into the multi-platform CI environment (parent ticket Implement multi-platform CI sync testing #2906) and accurately compares outputs/state to detect desyncs (e.g., floating-point precision differences).
The system is documented, detailing how the introspection picks up functions, how parameters are generated, and how developers should interpret and debug the resulting desync reports.
Task
Research and implement a compile-time code introspection tool that automatically generates a "fuzzer-esque" sync testing suite. This suite will automatically call engine functions in bulk with various auto-generated parameters to systematically catch cross-platform determinism and desync issues.
It is expected that coverage will not be 100% - this is intended to be another tool in the sync testing toolbox.
This may be very difficult or require very specialized knowledge. But worth researching to see if it's possible!
Context
Desyncs in the engine can originate from virtually anywhere in the codebase, making them notoriously difficult to predict and manually test for. Historically, desyncs have frequently been triggered highly edge case/esoteric places; in other words, we cant just test our float implementation and be done.
While we currently rely on seeded gameplay tests via fightertest (and maybe soon specific float precision tests #2908 ), manually writing comprehensive sync tests for every engine function is unscalable. Every time engine code signatures change we'd have to remember to add to the suite. By leveraging code introspection at compile time, we could automatically generate a test suite that fuzzes engine functions with a "kitchen sink" of parameters. This approach will allow us to test functions in bulk across different platforms (e.g., x86_64 vs ARM64) to ensure deterministic execution without requiring constant manual test maintenance.
The main challenge is how do we handle state; if state can't be injected easily or if important code is not written in a dependency-invertible way, we might struggle to test well.
Acceptance Criteria