SecBear
diff --git a/‎.claude/skills/doc-audit/SKILL.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/skills/doc-audit/SKILL.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/skills/test-audit/SKILL.md‎
Lines changed: 165 additions & 0 deletions b/‎.claude/skills/test-audit/SKILL.md‎
Lines changed: 165 additions & 0 deletions
diff --git a/‎.github/workflows/publish.yml‎
Lines changed: 50 additions & 0 deletions b/‎.github/workflows/publish.yml‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎.release-please-manifest.json‎
Lines changed: 1 addition & 0 deletions b/‎.release-please-manifest.json‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 7 additions & 1 deletion b/‎CLAUDE.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎Cargo.lock‎
Lines changed: 12 additions & 0 deletions b/‎Cargo.lock‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎Cargo.toml‎
Lines changed: 2 additions & 0 deletions b/‎Cargo.toml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 7 additions & 1 deletion b/‎README.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎ROADMAP.md‎
Lines changed: 4 additions & 0 deletions b/‎ROADMAP.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/book/src/SUMMARY.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/book/src/SUMMARY.md‎
Lines changed: 1 addition & 0 deletions
@@ -29,7 +29,7 @@ the same set appears in each of these files:
 | File | Section to check |
 |------|-----------------|
 | `CLAUDE.md` | Dependency graph ASCII art |
-| `llms.txt` | "Crates" section (~lines 43-55) |
+| `llms.txt` | "Crates" section (search for each crate name) |
 | `README.md` | "Crates" table |
 | `docs/book/src/introduction.md` | "What's Included" table |
 
 
@@ -0,0 +1,165 @@
+---
+name: test-audit
+description: Verify test coverage consistency across the neuron workspace
+---
+
+# Test Audit
+
+Run this skill after adding features, changing public API, or adding new crates.
+It systematically checks that test coverage is consistent with the codebase.
+
+## When to run
+
+- After implementing a new feature or public API change
+- After adding or removing a crate from the workspace
+- Before any release
+- When asked to "audit tests" or "check test coverage"
+
+## Discovery
+
+All checks start by parsing the root `Cargo.toml` `[workspace].members` to get
+the canonical crate set. No crate names are hardcoded — if a new crate is added,
+these checks automatically cover it.
+
+## Checklist
+
+Work through each check in order. Report findings as you go.
+
+### 1. Every crate has tests
+
+For each workspace member, verify at least one of:
+
+- A `tests/` directory containing `.rs` files
+- A `#[cfg(test)]` module in any `src/*.rs` file
+
+Flag any crate with zero test files.
+
+### 2. Public API test coverage
+
+For each crate:
+
+1. Read `src/lib.rs` and extract all public names: `pub use`, `pub struct`,
+   `pub enum`, `pub trait`, `pub fn`, `pub type`
+2. For glob re-exports (`pub use module::*`), read the source module to expand
+   the names
+3. Grep the crate's `tests/` directory and inline `#[cfg(test)]` modules for
+   each name
+4. Flag any public type, trait, or function that appears in zero test files
+
+This is a **heuristic** (name-based grep, not semantic analysis). False positives
+are acceptable — it's better to over-flag and let the auditor assess than to
+miss gaps.
+
+### 3. Error variant coverage
+
+For each crate that defines error enums (look for `#[derive(thiserror::Error)]`
+or files named `error.rs`):
+
+1. Extract each enum variant name
+2. Grep the crate's test files for that variant name
+3. Flag any variant that appears in zero tests
+
+Error paths are where bugs hide — every error variant should have at least one
+test that exercises it.
+
+### 4. Streaming parity
+
+This check is **scoped dynamically**: it only runs if the workspace contains a
+crate whose `src/` files define both `pub async fn run(` and
+`pub async fn run_stream(` methods (currently `neuron-loop`).
+
+For each such crate:
+
+1. Grep `tests/` for test function names containing `run(` or `.run(`
+   (non-streaming tests)
+2. Extract the feature keyword from the test name (e.g., `usage_limits`,
+   `cancellation`, `model_retry`, `compaction`, `hooks`)
+3. Check if a corresponding test exists with `run_stream` or `stream` in the
+   name testing the same feature
+4. Flag features that are tested only via the non-streaming path
+
+The streaming path often has different control flow and error handling — features
+need coverage in both paths.
+
+### 5. Example compilation
+
+Run:
+
+```
+cargo build --workspace --examples
+```
+
+Report any compilation failures. This is a binary pass/fail.
+
+### 6. Test infrastructure consistency
+
+For each crate's test files (`tests/*.rs` and inline `#[cfg(test)]` modules):
+
+- **Missing async attribute**: Flag any `async fn test_*` or `async fn *_test`
+  that lacks `#[tokio::test]` (likely a missing attribute — the test won't run)
+- **Brittle panic tests**: Flag any `#[should_panic]` without an
+  `expected = "..."` message (these pass on ANY panic, hiding real bugs)
+
+Report as informational findings, not hard failures.
+
+### 7. Test count summary
+
+Run `cargo test --workspace` and parse the output to report per-crate test
+counts. This is **informational only** — no PASS/FAIL. It establishes a
+baseline so regressions are visible.
+
+Report as a table:
+
+```
+| Crate | Tests |
+|-------|-------|
+| neuron-types | 160 |
+| neuron-tool | 60 |
+| ... | ... |
+| **Total** | **N** |
+```
+
+Include both integration tests (from `tests/`) and inline tests (from
+`#[cfg(test)]` modules) in the count.
+
+## Fixing issues
+
+For each issue found:
+
+- **Missing test for public type**: Add at least one test that constructs or
+  uses the type. For traits, add a test with a mock implementation.
+- **Missing error variant test**: Add a test that triggers the error condition
+  and asserts the variant.
+- **Missing streaming test**: Clone the `run()` test, adapt it to use
+  `run_stream()`, and verify the same behavior via stream events.
+- **Infrastructure issues**: Add missing `#[tokio::test]` attributes, add
+  `expected` messages to `#[should_panic]`.
+
+## Output format
+
+Report results as:
+
+```
+## Test Audit Results
+
+### Check 1: Every crate has tests - PASS/FAIL
+[details if FAIL]
+
+### Check 2: Public API test coverage - PASS/FAIL
+[details if FAIL, listing untested types per crate]
+
+### Check 3: Error variant coverage - PASS/FAIL
+[details if FAIL, listing untested variants]
+
+### Check 4: Streaming parity - PASS/FAIL
+[details if FAIL, listing features missing streaming tests]
+
+### Check 5: Example compilation - PASS/FAIL
+[details if FAIL]
+
+### Check 6: Test infrastructure - INFO
+[any findings]
+
+### Check 7: Test count summary - INFO
+[table of per-crate counts]
+```
@@ -0,0 +1,50 @@
+name: publish
+
+on:
+  release:
+    types: [published]
+
+permissions:
+  contents: read
+
+env:
+  CARGO_TERM_COLOR: always
+
+jobs:
+  publish:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: dtolnay/rust-toolchain@stable
+
+      - uses: Swatinem/rust-cache@v2
+
+      - name: Determine crate from release tag
+        id: crate
+        run: |
+          TAG="${{ github.event.release.tag_name }}"
+          # Tag format: <crate-name>/v<version> (e.g. neuron-types/v0.3.0)
+          CRATE="${TAG%%/v*}"
+          echo "name=$CRATE" >> "$GITHUB_OUTPUT"
+          echo "Publishing crate: $CRATE"
+
+      - name: Publish to crates.io
+        run: |
+          # Retry up to 3 times with 30s delay to handle crates.io index
+          # propagation when a dependency was just published.
+          for attempt in 1 2 3; do
+            echo "Attempt $attempt: publishing ${{ steps.crate.outputs.name }}"
+            if cargo publish -p ${{ steps.crate.outputs.name }}; then
+              echo "Published successfully"
+              exit 0
+            fi
+            if [ "$attempt" -lt 3 ]; then
+              echo "Publish failed, waiting 30s for index propagation..."
+              sleep 30
+            fi
+          done
+          echo "Failed to publish after 3 attempts"
+          exit 1
+        env:
+          CARGO_REGISTRY_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
@@ -9,5 +9,6 @@
   "neuron-provider-ollama": "0.2.0",
   "neuron-mcp": "0.2.0",
   "neuron-runtime": "0.2.0",
+  "neuron-otel": "0.2.0",
   "neuron": "0.2.0"
 }
@@ -81,6 +81,8 @@ building block.
 | Context compaction strategies | Parallel guardrail orchestration |
 | MCP integration | `StopAtTools` declarative loop termination |
 | Provider crates (one per provider) | `Agent.override()` testing DX |
+| Usage limits and budget enforcement | Custom billing/quota logic |
+| OTel instrumentation (GenAI semantic conventions) | Custom telemetry dashboards |
 | `from_env()`, `Message::user()` conveniences | Sub-agent orchestration registry |
 
 When evaluating a new feature, apply this test before adding it to any crate.
@@ -160,10 +162,12 @@ When making design decisions, apply these filters in order:
 
 ```
 neuron-types                    (zero deps, the foundation)
+neuron-tool-macros              (zero deps, proc macro)
     ^
     |-- neuron-provider-*       (each implements Provider trait)
+    |-- neuron-otel             (OTel instrumentation, GenAI semantic conventions)
     |-- neuron-context          (compaction strategies, token counting)
-    +-- neuron-tool             (Tool trait, registry, middleware)
+    +-- neuron-tool             (Tool trait, registry, middleware; optional dep on neuron-tool-macros)
             ^
             |-- neuron-mcp      (wraps rmcp, bridges to Tool trait)
             |-- neuron-loop     (provider loop with tool dispatch)
@@ -344,6 +348,8 @@ Before completing any PR or commit that touches public API or docs:
 8. mdBook guide pages reflect current API (spot-check changed features)
 9. Run the `doc-audit` skill (`.claude/skills/doc-audit/SKILL.md`) to verify
    documentation-code consistency
+10. Run the `test-audit` skill (`.claude/skills/test-audit/SKILL.md`) to verify
+    test coverage consistency
 
 ---
 
 
@@ -11,6 +11,7 @@ members = [
   "neuron-provider-ollama",
   "neuron-mcp",
   "neuron-runtime",
+  "neuron-otel",
   "neuron",
 ]
 exclude = [
@@ -38,6 +39,7 @@ neuron-mcp = { version = "0.2.0", path = "neuron-mcp" }
 neuron-provider-anthropic = { version = "0.2.0", path = "neuron-provider-anthropic" }
 neuron-provider-openai = { version = "0.2.0", path = "neuron-provider-openai" }
 neuron-provider-ollama = { version = "0.2.0", path = "neuron-provider-ollama" }
+neuron-otel = { version = "0.2.0", path = "neuron-otel" }
 neuron = { version = "0.2.0", path = "neuron" }
 
 # External
 
@@ -73,6 +73,7 @@ need to get productive:
 | `neuron-provider-ollama`    | Ollama — local LLM inference with NDJSON streaming            |
 | `neuron-mcp`                | MCP client and server — stdio, HTTP, tool bridging            |
 | `neuron-runtime`            | Sessions, sub-agents, guardrails, durable execution           |
+| `neuron-otel`               | OTel instrumentation — GenAI semantic conventions with tracing spans |
 | `neuron`                    | Umbrella crate with feature flags                             |
 
 ## Feature Flags (neuron)
@@ -84,6 +85,7 @@ need to get productive:
 | `ollama`    | Ollama local provider              | no      |
 | `mcp`       | Model Context Protocol integration | no      |
 | `runtime`   | Sessions, sub-agents, guardrails   | no      |
+| `otel`      | OpenTelemetry instrumentation       | no      |
 | `full`      | All of the above                   | no      |
 
 ## Architecture
@@ -92,6 +94,7 @@ need to get productive:
 neuron-types                    (zero deps, the foundation)
     ^
     |-- neuron-provider-*       (each implements Provider trait)
+    |-- neuron-otel             (OTel instrumentation, GenAI semantic conventions)
     |-- neuron-context          (compaction strategies, token counting)
     +-- neuron-tool             (Tool trait, registry, middleware)
             ^
@@ -121,7 +124,10 @@ How neuron compares to the two most established Rust alternatives, based on
 | Sessions | `SessionStorage` trait + impls | None | None |
 | Vector stores / RAG | None | Many integrations | None |
 | Embeddings | None | `EmbeddingModel` trait | Yes |
-| OpenTelemetry | Trait only | Full integration | None |
+| Usage limits | `UsageLimits` token/request budget | None | None |
+| Tool timeouts | `TimeoutMiddleware` per-tool | None | None |
+| Structured output validation | `StructuredOutputValidator` with self-correction | None | None |
+| OpenTelemetry | GenAI semantic conventions (`neuron-otel`) | Full integration | None |
 
 **Where others lead today:** Rig has a larger provider and vector store
 ecosystem with an extensive example set. genai covers many providers in one
 
@@ -51,6 +51,10 @@ What ships today:
 - **Property-based tests** -- proptest for serde roundtrips, error classification, token monotonicity, middleware ordering
 - **Criterion benchmarks** -- serialization throughput, token counting, agent loop latency
 - **Fuzz targets** -- cargo-fuzz for all 3 provider response parsers
+- **`UsageLimits`** -- token/request budget enforcement in the agentic loop; `LoopConfig.usage_limits` field, `LoopError::UsageLimitExceeded` variant; inspired by Pydantic AI's usage limit pattern
+- **`TimeoutMiddleware`** -- per-tool execution timeouts via `tokio::time::timeout`; register as global or per-tool middleware to prevent runaway tool calls
+- **`StructuredOutputValidator` + `RetryLimitedValidator`** -- JSON Schema validation middleware returning `ToolError::ModelRetry` for self-correction; validates tool input against schemas and gives the model a chance to retry with a hint, with configurable retry limits
+- **`neuron-otel`** -- OpenTelemetry instrumentation crate implementing `ObservabilityHook` with `tracing` spans following GenAI semantic conventions (`gen_ai.loop.iteration`, `gen_ai.chat`, `gen_ai.execute_tool`, `gen_ai.context.compaction`); opt-in content capture via `OtelConfig`
 
 ## Next
 
 
@@ -18,6 +18,7 @@
 - [Runtime](guides/runtime.md)
 - [Embeddings](guides/embeddings.md)
 - [Testing Agents](guides/testing.md)
+- [Observability](guides/observability.md)
 
 # Architecture
Original file line number	Diff line number	Diff line change
`@@ -9,5 +9,6 @@`
`9`	`9`	`"neuron-provider-ollama": "0.2.0",`
`10`	`10`	`"neuron-mcp": "0.2.0",`
`11`	`11`	`"neuron-runtime": "0.2.0",`
	`12`	`+ "neuron-otel": "0.2.0",`
`12`	`13`	`"neuron": "0.2.0"`
`13`	`14`	`}`