pkg/aflow: expose reproducer test coverage to the LLM#7008
pkg/aflow: expose reproducer test coverage to the LLM#7008ramosian-glider wants to merge 2 commits intogoogle:masterfrom
Conversation
4d4ca67 to
c2a901c
Compare
|
@gemini-cli /review |
|
🤖 Hi @tarasmadan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This Pull Request introduces a powerful mechanism to expose code coverage from crash reproducer runs to the LLM agent. This is achieved by caching coverage artifacts and providing specialized tools for the agent to query and inspect covered files and source code snippets. The overall quality and architecture are consistent with the aflow package, but there are a few critical issues that need to be addressed before merging.
🔍 General Feedback
- Positive Highlights: The addition of coverage-aware tools is a significant improvement for the agent's ability to debug and refine reproducers. The caching strategy is well-integrated with the existing
aflowinfrastructure. - Security: The
get-file-coveragetool currently lacks validation for theFilenameargument, which could lead to path traversal if a model provides an absolute or relative path that escapes the kernel source root. - Regression: The
reproducetool now always enables coverage collection, which will cause it to fail for C-only reproducers that don't have a corresponding syzkaller program, as the underlyingRunTestrequires a syzkaller program for coverage collection.
8da3bbb to
0acc6d9
Compare
0acc6d9 to
29db92c
Compare
Currently, the AI reproducer workflow simply reports whether a program crashed and what bug title was produced. This doesn't provide enough localized context for the LLM to refine and correctly debug a reproducer when it fails. This patch adds mechanisms to extract and cache coverage generated during the `crash.Reproduce` test cycle. The cached coverage arrays are decoupled from the immediate LLM context and are instead serialized to persistent directories. The unique hash ID (`CoverageID`) of this cache is then mapped and passed back to the LLM agent within `ReproduceResult`. Also introduces `get-coverage-files` and `get-file-coverage` LLM tools to `pkg/aflow/tool/syzlang`, enabling the `crash-repro-finder` agent to query and securely inspect these precise coverage traces file-by-file. To support sandboxed cache resolution, a new `CacheDir` helper is added to `aflow.Context`. Updates google#6878
The ai coverage tools rely heavily on the internal aflow.Cache to read and parse reproducer test run footprints. In order to test them without requiring real LLM workflows or live syz-manager VMs, we need a way to mock the cache locally. This patch adds a `NewTestContext` helper in `pkg/aflow/test_util.go` to easily initialize an `aflow.Context` with an explicit local dummy disk cache. It then introduces `pkg/aflow/tool/syzlang/coverage_test.go` to thoroughly mock and test `getCoverageFiles` and `getFileCoverage`. The tests manufacture a fake source tree along with `symbolizer.Frame` coverage payloads to assert that the AI tool coverage extraction algorithm, including its snippet padding, right-aligned line spacing, and binary-search deduplication logic, all format correctly for the agent context.
29db92c to
7c510bf
Compare
|
@gemini-cli /review |
|
🤖 Hi @ramosian-glider, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This PR successfully integrates coverage extraction and caching into the crash reproduction workflow, providing essential context for LLM agents to debug and refine reproducers. The implementation is clean, follows established aflow patterns, and includes robust security checks for file access.
🔍 General Feedback
- Caching: The use of
aflow.Cachewithdesc-based hashing correctly ensures that coverage data is logically linked to the specific test configuration. - Security:
filepath.IsLocalis correctly applied toCoverageIDandFilenameto prevent directory traversal. - Testing: New tools are well-tested with appropriate mock data.
A minor improvement is suggested to ensure that the CoverageID is consistently returned even when a crash isn't reproduced, which is vital for the agent's debugging capabilities.
| } | ||
| res.CoverageID = filepath.Base(dir) | ||
| } | ||
| return res, err |
There was a problem hiding this comment.
| return res, err | |
| } else if cached.Report == "" { | |
| return reproduceResult{}, cached.CoverageID, aflow.FlowError(ErrDidNotCrash) | |
| } |
| if err != nil { | ||
| if errors.Is(err, crash.ErrDidNotCrash) { | ||
| return ReproduceResult{}, nil |
There was a problem hiding this comment.
| if err != nil { | |
| if errors.Is(err, crash.ErrDidNotCrash) { | |
| return ReproduceResult{}, nil | |
| if err != nil { | |
| if errors.Is(err, crash.ErrDidNotCrash) { | |
| return ReproduceResult{CoverageID: coverageID}, nil | |
| } | |
| return ReproduceResult{}, err | |
| } |
Currently, the AI reproducer workflow simply reports whether a program crashed and what bug title was produced. This doesn't provide enough localized context for the LLM to refine and correctly debug a reproducer when it fails.
This patch adds mechanisms to extract and cache coverage generated during the
crash.Reproducetest cycle. The cached coverage arrays are decoupled from the immediate LLM context and are instead serialized to persistent directories. The unique hash ID (CoverageID) of this cache is then mapped and passed back to the LLM agent withinReproduceResult.Also introduces
get-coverage-filesandget-file-coverageLLM tools topkg/aflow/tool/syzlang, enabling thecrash-repro-finderagent to query and securely inspect these precise coverage traces file-by-file.Before sending a pull request, please review Contribution Guidelines:
https://github.com/google/syzkaller/blob/master/docs/contributing.md