Skip to content

pkg/aflow: expose reproducer test coverage to the LLM#7008

Open
ramosian-glider wants to merge 2 commits intogoogle:masterfrom
ramosian-glider:hackathon-250326
Open

pkg/aflow: expose reproducer test coverage to the LLM#7008
ramosian-glider wants to merge 2 commits intogoogle:masterfrom
ramosian-glider:hackathon-250326

Conversation

@ramosian-glider
Copy link
Copy Markdown
Member

Currently, the AI reproducer workflow simply reports whether a program crashed and what bug title was produced. This doesn't provide enough localized context for the LLM to refine and correctly debug a reproducer when it fails.

This patch adds mechanisms to extract and cache coverage generated during the crash.Reproduce test cycle. The cached coverage arrays are decoupled from the immediate LLM context and are instead serialized to persistent directories. The unique hash ID (CoverageID) of this cache is then mapped and passed back to the LLM agent within ReproduceResult.

Also introduces get-coverage-files and get-file-coverage LLM tools to pkg/aflow/tool/syzlang, enabling the crash-repro-finder agent to query and securely inspect these precise coverage traces file-by-file.


Before sending a pull request, please review Contribution Guidelines:
https://github.com/google/syzkaller/blob/master/docs/contributing.md


@ramosian-glider ramosian-glider force-pushed the hackathon-250326 branch 7 times, most recently from 4d4ca67 to c2a901c Compare March 27, 2026 10:37
@ramosian-glider ramosian-glider requested a review from dvyukov March 27, 2026 10:53
@ramosian-glider ramosian-glider marked this pull request as ready for review March 27, 2026 10:53
@tarasmadan tarasmadan self-requested a review March 27, 2026 11:06
@tarasmadan
Copy link
Copy Markdown
Collaborator

@gemini-cli /review

@github-actions
Copy link
Copy Markdown

🤖 Hi @tarasmadan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

This Pull Request introduces a powerful mechanism to expose code coverage from crash reproducer runs to the LLM agent. This is achieved by caching coverage artifacts and providing specialized tools for the agent to query and inspect covered files and source code snippets. The overall quality and architecture are consistent with the aflow package, but there are a few critical issues that need to be addressed before merging.

🔍 General Feedback

  • Positive Highlights: The addition of coverage-aware tools is a significant improvement for the agent's ability to debug and refine reproducers. The caching strategy is well-integrated with the existing aflow infrastructure.
  • Security: The get-file-coverage tool currently lacks validation for the Filename argument, which could lead to path traversal if a model provides an absolute or relative path that escapes the kernel source root.
  • Regression: The reproduce tool now always enables coverage collection, which will cause it to fail for C-only reproducers that don't have a corresponding syzkaller program, as the underlying RunTest requires a syzkaller program for coverage collection.

@ramosian-glider ramosian-glider force-pushed the hackathon-250326 branch 2 times, most recently from 8da3bbb to 0acc6d9 Compare March 27, 2026 11:36
Copy link
Copy Markdown
Collaborator

@dvyukov dvyukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reading the rest

Currently, the AI reproducer workflow simply reports whether a program
crashed and what bug title was produced. This doesn't provide enough
localized context for the LLM to refine and correctly debug a reproducer
when it fails.

This patch adds mechanisms to extract and cache coverage generated during
the `crash.Reproduce` test cycle. The cached coverage arrays are decoupled
from the immediate LLM context and are instead serialized to persistent
directories. The unique hash ID (`CoverageID`) of this cache is then
mapped and passed back to the LLM agent within `ReproduceResult`.

Also introduces `get-coverage-files` and `get-file-coverage` LLM tools to
`pkg/aflow/tool/syzlang`, enabling the `crash-repro-finder` agent to query
and securely inspect these precise coverage traces file-by-file. To
support sandboxed cache resolution, a new `CacheDir` helper is added to
`aflow.Context`.

Updates google#6878
The ai coverage tools rely heavily on the internal aflow.Cache to
read and parse reproducer test run footprints. In order to test them
without requiring real LLM workflows or live syz-manager VMs, we need
a way to mock the cache locally.

This patch adds a `NewTestContext` helper in `pkg/aflow/test_util.go`
to easily initialize an `aflow.Context` with an explicit local dummy
disk cache.

It then introduces `pkg/aflow/tool/syzlang/coverage_test.go`
to thoroughly mock and test `getCoverageFiles` and `getFileCoverage`.
The tests manufacture a fake source tree along with `symbolizer.Frame`
coverage payloads to assert that the AI tool coverage extraction
algorithm, including its snippet padding, right-aligned line spacing,
and binary-search deduplication logic, all format correctly for the agent
context.
@ramosian-glider
Copy link
Copy Markdown
Member Author

@gemini-cli /review

@github-actions
Copy link
Copy Markdown

🤖 Hi @ramosian-glider, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

This PR successfully integrates coverage extraction and caching into the crash reproduction workflow, providing essential context for LLM agents to debug and refine reproducers. The implementation is clean, follows established aflow patterns, and includes robust security checks for file access.

🔍 General Feedback

  • Caching: The use of aflow.Cache with desc-based hashing correctly ensures that coverage data is logically linked to the specific test configuration.
  • Security: filepath.IsLocal is correctly applied to CoverageID and Filename to prevent directory traversal.
  • Testing: New tools are well-tested with appropriate mock data.

A minor improvement is suggested to ensure that the CoverageID is consistently returned even when a crash isn't reproduced, which is vital for the agent's debugging capabilities.

}
res.CoverageID = filepath.Base(dir)
}
return res, err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 If the reproducer fails to crash the kernel, the `CoverageID` is currently being lost because it's not returned when `ErrDidNotCrash` is encountered. Returning the `CoverageID` even in this case is crucial for providing the LLM agent with the context it needs to debug why the reproducer failed.
Suggested change
return res, err
} else if cached.Report == "" {
return reproduceResult{}, cached.CoverageID, aflow.FlowError(ErrDidNotCrash)
}

Comment on lines 76 to 78
if err != nil {
if errors.Is(err, crash.ErrDidNotCrash) {
return ReproduceResult{}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 To allow the LLM to inspect the execution trace when a reproduction attempt fails to crash, the `CoverageID` should be returned even when `ErrDidNotCrash` is handled.
Suggested change
if err != nil {
if errors.Is(err, crash.ErrDidNotCrash) {
return ReproduceResult{}, nil
if err != nil {
if errors.Is(err, crash.ErrDidNotCrash) {
return ReproduceResult{CoverageID: coverageID}, nil
}
return ReproduceResult{}, err
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants