Skip to content

Commit 1af2182

Browse files
authored
feat: usage limits, tool timeout, structured output validation, and OTel instrumentation (#40)
* feat: add usage limits, tool timeout, structured output validation, and OTel instrumentation Four Pydantic AI-inspired features implemented as independent building blocks: - **UsageLimits** (neuron-types + neuron-loop): Token budget enforcement with request, tool call, input/output/total token limits. Checked at 3 points per loop iteration (pre-request, post-response, pre-tool-call) in both run() and run_stream() paths. - **TimeoutMiddleware** (neuron-tool): Per-tool execution timeout via tokio::time::timeout with default and per-tool overrides. Implements ToolMiddleware trait. - **StructuredOutputValidator + RetryLimitedValidator** (neuron-tool): JSON Schema validation returning ToolError::ModelRetry for self-correction, with atomic retry counting. Both implement ToolMiddleware trait. - **neuron-otel** (new crate): OtelHook implementing ObservabilityHook with gen_ai.* GenAI semantic convention tracing spans. Leaf node depending only on neuron-types. Also includes: - Fix: usage limit enforcement was missing from run_stream() path - 49 new tests (1042 total), full streaming parity - Comprehensive doc updates across all surfaces (mdBook, READMEs, llms.txt, CLAUDE.md, ROADMAP.md) - New test-audit skill for automated test coverage verification - All clippy warnings resolved workspace-wide including tests/examples * chore: fix formatting and add license symlinks for neuron-otel Run cargo fmt --all and add LICENSE-MIT/LICENSE-APACHE symlinks to neuron-otel to fix CI format and link checker failures. * ci: add cargo publish workflow and register neuron-otel in release-please Add publish.yml that triggers on GitHub release creation, parses the release tag to determine the crate, and publishes it to crates.io with retry logic for index propagation delays. Also register the new neuron-otel crate in release-please config and manifest.
1 parent 3191159 commit 1af2182

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+3695
-216
lines changed

.claude/skills/doc-audit/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ the same set appears in each of these files:
2929
| File | Section to check |
3030
|------|-----------------|
3131
| `CLAUDE.md` | Dependency graph ASCII art |
32-
| `llms.txt` | "Crates" section (~lines 43-55) |
32+
| `llms.txt` | "Crates" section (search for each crate name) |
3333
| `README.md` | "Crates" table |
3434
| `docs/book/src/introduction.md` | "What's Included" table |
3535

.claude/skills/test-audit/SKILL.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
---
2+
name: test-audit
3+
description: Verify test coverage consistency across the neuron workspace
4+
---
5+
6+
# Test Audit
7+
8+
Run this skill after adding features, changing public API, or adding new crates.
9+
It systematically checks that test coverage is consistent with the codebase.
10+
11+
## When to run
12+
13+
- After implementing a new feature or public API change
14+
- After adding or removing a crate from the workspace
15+
- Before any release
16+
- When asked to "audit tests" or "check test coverage"
17+
18+
## Discovery
19+
20+
All checks start by parsing the root `Cargo.toml` `[workspace].members` to get
21+
the canonical crate set. No crate names are hardcoded — if a new crate is added,
22+
these checks automatically cover it.
23+
24+
## Checklist
25+
26+
Work through each check in order. Report findings as you go.
27+
28+
### 1. Every crate has tests
29+
30+
For each workspace member, verify at least one of:
31+
32+
- A `tests/` directory containing `.rs` files
33+
- A `#[cfg(test)]` module in any `src/*.rs` file
34+
35+
Flag any crate with zero test files.
36+
37+
### 2. Public API test coverage
38+
39+
For each crate:
40+
41+
1. Read `src/lib.rs` and extract all public names: `pub use`, `pub struct`,
42+
`pub enum`, `pub trait`, `pub fn`, `pub type`
43+
2. For glob re-exports (`pub use module::*`), read the source module to expand
44+
the names
45+
3. Grep the crate's `tests/` directory and inline `#[cfg(test)]` modules for
46+
each name
47+
4. Flag any public type, trait, or function that appears in zero test files
48+
49+
This is a **heuristic** (name-based grep, not semantic analysis). False positives
50+
are acceptable — it's better to over-flag and let the auditor assess than to
51+
miss gaps.
52+
53+
### 3. Error variant coverage
54+
55+
For each crate that defines error enums (look for `#[derive(thiserror::Error)]`
56+
or files named `error.rs`):
57+
58+
1. Extract each enum variant name
59+
2. Grep the crate's test files for that variant name
60+
3. Flag any variant that appears in zero tests
61+
62+
Error paths are where bugs hide — every error variant should have at least one
63+
test that exercises it.
64+
65+
### 4. Streaming parity
66+
67+
This check is **scoped dynamically**: it only runs if the workspace contains a
68+
crate whose `src/` files define both `pub async fn run(` and
69+
`pub async fn run_stream(` methods (currently `neuron-loop`).
70+
71+
For each such crate:
72+
73+
1. Grep `tests/` for test function names containing `run(` or `.run(`
74+
(non-streaming tests)
75+
2. Extract the feature keyword from the test name (e.g., `usage_limits`,
76+
`cancellation`, `model_retry`, `compaction`, `hooks`)
77+
3. Check if a corresponding test exists with `run_stream` or `stream` in the
78+
name testing the same feature
79+
4. Flag features that are tested only via the non-streaming path
80+
81+
The streaming path often has different control flow and error handling — features
82+
need coverage in both paths.
83+
84+
### 5. Example compilation
85+
86+
Run:
87+
88+
```
89+
cargo build --workspace --examples
90+
```
91+
92+
Report any compilation failures. This is a binary pass/fail.
93+
94+
### 6. Test infrastructure consistency
95+
96+
For each crate's test files (`tests/*.rs` and inline `#[cfg(test)]` modules):
97+
98+
- **Missing async attribute**: Flag any `async fn test_*` or `async fn *_test`
99+
that lacks `#[tokio::test]` (likely a missing attribute — the test won't run)
100+
- **Brittle panic tests**: Flag any `#[should_panic]` without an
101+
`expected = "..."` message (these pass on ANY panic, hiding real bugs)
102+
103+
Report as informational findings, not hard failures.
104+
105+
### 7. Test count summary
106+
107+
Run `cargo test --workspace` and parse the output to report per-crate test
108+
counts. This is **informational only** — no PASS/FAIL. It establishes a
109+
baseline so regressions are visible.
110+
111+
Report as a table:
112+
113+
```
114+
| Crate | Tests |
115+
|-------|-------|
116+
| neuron-types | 160 |
117+
| neuron-tool | 60 |
118+
| ... | ... |
119+
| **Total** | **N** |
120+
```
121+
122+
Include both integration tests (from `tests/`) and inline tests (from
123+
`#[cfg(test)]` modules) in the count.
124+
125+
## Fixing issues
126+
127+
For each issue found:
128+
129+
- **Missing test for public type**: Add at least one test that constructs or
130+
uses the type. For traits, add a test with a mock implementation.
131+
- **Missing error variant test**: Add a test that triggers the error condition
132+
and asserts the variant.
133+
- **Missing streaming test**: Clone the `run()` test, adapt it to use
134+
`run_stream()`, and verify the same behavior via stream events.
135+
- **Infrastructure issues**: Add missing `#[tokio::test]` attributes, add
136+
`expected` messages to `#[should_panic]`.
137+
138+
## Output format
139+
140+
Report results as:
141+
142+
```
143+
## Test Audit Results
144+
145+
### Check 1: Every crate has tests - PASS/FAIL
146+
[details if FAIL]
147+
148+
### Check 2: Public API test coverage - PASS/FAIL
149+
[details if FAIL, listing untested types per crate]
150+
151+
### Check 3: Error variant coverage - PASS/FAIL
152+
[details if FAIL, listing untested variants]
153+
154+
### Check 4: Streaming parity - PASS/FAIL
155+
[details if FAIL, listing features missing streaming tests]
156+
157+
### Check 5: Example compilation - PASS/FAIL
158+
[details if FAIL]
159+
160+
### Check 6: Test infrastructure - INFO
161+
[any findings]
162+
163+
### Check 7: Test count summary - INFO
164+
[table of per-crate counts]
165+
```

.github/workflows/publish.yml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
name: publish
2+
3+
on:
4+
release:
5+
types: [published]
6+
7+
permissions:
8+
contents: read
9+
10+
env:
11+
CARGO_TERM_COLOR: always
12+
13+
jobs:
14+
publish:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- uses: dtolnay/rust-toolchain@stable
20+
21+
- uses: Swatinem/rust-cache@v2
22+
23+
- name: Determine crate from release tag
24+
id: crate
25+
run: |
26+
TAG="${{ github.event.release.tag_name }}"
27+
# Tag format: <crate-name>/v<version> (e.g. neuron-types/v0.3.0)
28+
CRATE="${TAG%%/v*}"
29+
echo "name=$CRATE" >> "$GITHUB_OUTPUT"
30+
echo "Publishing crate: $CRATE"
31+
32+
- name: Publish to crates.io
33+
run: |
34+
# Retry up to 3 times with 30s delay to handle crates.io index
35+
# propagation when a dependency was just published.
36+
for attempt in 1 2 3; do
37+
echo "Attempt $attempt: publishing ${{ steps.crate.outputs.name }}"
38+
if cargo publish -p ${{ steps.crate.outputs.name }}; then
39+
echo "Published successfully"
40+
exit 0
41+
fi
42+
if [ "$attempt" -lt 3 ]; then
43+
echo "Publish failed, waiting 30s for index propagation..."
44+
sleep 30
45+
fi
46+
done
47+
echo "Failed to publish after 3 attempts"
48+
exit 1
49+
env:
50+
CARGO_REGISTRY_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}

.release-please-manifest.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,6 @@
99
"neuron-provider-ollama": "0.2.0",
1010
"neuron-mcp": "0.2.0",
1111
"neuron-runtime": "0.2.0",
12+
"neuron-otel": "0.2.0",
1213
"neuron": "0.2.0"
1314
}

CLAUDE.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,8 @@ building block.
8181
| Context compaction strategies | Parallel guardrail orchestration |
8282
| MCP integration | `StopAtTools` declarative loop termination |
8383
| Provider crates (one per provider) | `Agent.override()` testing DX |
84+
| Usage limits and budget enforcement | Custom billing/quota logic |
85+
| OTel instrumentation (GenAI semantic conventions) | Custom telemetry dashboards |
8486
| `from_env()`, `Message::user()` conveniences | Sub-agent orchestration registry |
8587

8688
When evaluating a new feature, apply this test before adding it to any crate.
@@ -160,10 +162,12 @@ When making design decisions, apply these filters in order:
160162

161163
```
162164
neuron-types (zero deps, the foundation)
165+
neuron-tool-macros (zero deps, proc macro)
163166
^
164167
|-- neuron-provider-* (each implements Provider trait)
168+
|-- neuron-otel (OTel instrumentation, GenAI semantic conventions)
165169
|-- neuron-context (compaction strategies, token counting)
166-
+-- neuron-tool (Tool trait, registry, middleware)
170+
+-- neuron-tool (Tool trait, registry, middleware; optional dep on neuron-tool-macros)
167171
^
168172
|-- neuron-mcp (wraps rmcp, bridges to Tool trait)
169173
|-- neuron-loop (provider loop with tool dispatch)
@@ -344,6 +348,8 @@ Before completing any PR or commit that touches public API or docs:
344348
8. mdBook guide pages reflect current API (spot-check changed features)
345349
9. Run the `doc-audit` skill (`.claude/skills/doc-audit/SKILL.md`) to verify
346350
documentation-code consistency
351+
10. Run the `test-audit` skill (`.claude/skills/test-audit/SKILL.md`) to verify
352+
test coverage consistency
347353

348354
---
349355

Cargo.lock

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ members = [
1111
"neuron-provider-ollama",
1212
"neuron-mcp",
1313
"neuron-runtime",
14+
"neuron-otel",
1415
"neuron",
1516
]
1617
exclude = [
@@ -38,6 +39,7 @@ neuron-mcp = { version = "0.2.0", path = "neuron-mcp" }
3839
neuron-provider-anthropic = { version = "0.2.0", path = "neuron-provider-anthropic" }
3940
neuron-provider-openai = { version = "0.2.0", path = "neuron-provider-openai" }
4041
neuron-provider-ollama = { version = "0.2.0", path = "neuron-provider-ollama" }
42+
neuron-otel = { version = "0.2.0", path = "neuron-otel" }
4143
neuron = { version = "0.2.0", path = "neuron" }
4244

4345
# External

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ need to get productive:
7373
| `neuron-provider-ollama` | Ollama — local LLM inference with NDJSON streaming |
7474
| `neuron-mcp` | MCP client and server — stdio, HTTP, tool bridging |
7575
| `neuron-runtime` | Sessions, sub-agents, guardrails, durable execution |
76+
| `neuron-otel` | OTel instrumentation — GenAI semantic conventions with tracing spans |
7677
| `neuron` | Umbrella crate with feature flags |
7778

7879
## Feature Flags (neuron)
@@ -84,6 +85,7 @@ need to get productive:
8485
| `ollama` | Ollama local provider | no |
8586
| `mcp` | Model Context Protocol integration | no |
8687
| `runtime` | Sessions, sub-agents, guardrails | no |
88+
| `otel` | OpenTelemetry instrumentation | no |
8789
| `full` | All of the above | no |
8890

8991
## Architecture
@@ -92,6 +94,7 @@ need to get productive:
9294
neuron-types (zero deps, the foundation)
9395
^
9496
|-- neuron-provider-* (each implements Provider trait)
97+
|-- neuron-otel (OTel instrumentation, GenAI semantic conventions)
9598
|-- neuron-context (compaction strategies, token counting)
9699
+-- neuron-tool (Tool trait, registry, middleware)
97100
^
@@ -121,7 +124,10 @@ How neuron compares to the two most established Rust alternatives, based on
121124
| Sessions | `SessionStorage` trait + impls | None | None |
122125
| Vector stores / RAG | None | Many integrations | None |
123126
| Embeddings | None | `EmbeddingModel` trait | Yes |
124-
| OpenTelemetry | Trait only | Full integration | None |
127+
| Usage limits | `UsageLimits` token/request budget | None | None |
128+
| Tool timeouts | `TimeoutMiddleware` per-tool | None | None |
129+
| Structured output validation | `StructuredOutputValidator` with self-correction | None | None |
130+
| OpenTelemetry | GenAI semantic conventions (`neuron-otel`) | Full integration | None |
125131

126132
**Where others lead today:** Rig has a larger provider and vector store
127133
ecosystem with an extensive example set. genai covers many providers in one

ROADMAP.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,10 @@ What ships today:
5151
- **Property-based tests** -- proptest for serde roundtrips, error classification, token monotonicity, middleware ordering
5252
- **Criterion benchmarks** -- serialization throughput, token counting, agent loop latency
5353
- **Fuzz targets** -- cargo-fuzz for all 3 provider response parsers
54+
- **`UsageLimits`** -- token/request budget enforcement in the agentic loop; `LoopConfig.usage_limits` field, `LoopError::UsageLimitExceeded` variant; inspired by Pydantic AI's usage limit pattern
55+
- **`TimeoutMiddleware`** -- per-tool execution timeouts via `tokio::time::timeout`; register as global or per-tool middleware to prevent runaway tool calls
56+
- **`StructuredOutputValidator` + `RetryLimitedValidator`** -- JSON Schema validation middleware returning `ToolError::ModelRetry` for self-correction; validates tool input against schemas and gives the model a chance to retry with a hint, with configurable retry limits
57+
- **`neuron-otel`** -- OpenTelemetry instrumentation crate implementing `ObservabilityHook` with `tracing` spans following GenAI semantic conventions (`gen_ai.loop.iteration`, `gen_ai.chat`, `gen_ai.execute_tool`, `gen_ai.context.compaction`); opt-in content capture via `OtelConfig`
5458

5559
## Next
5660

docs/book/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
- [Runtime](guides/runtime.md)
1919
- [Embeddings](guides/embeddings.md)
2020
- [Testing Agents](guides/testing.md)
21+
- [Observability](guides/observability.md)
2122

2223
# Architecture
2324

0 commit comments

Comments
 (0)