Observability + budget guardrails for Hermes Agent
Budget enforcement + observability for Hermes Agent. The only plugin that can stop a run before it overspends.
A comprehensive telemetry plugin that captures real usage data, enforces budget limits, and provides detailed cost analysis for AI agent operations. Built for the Hermes Agent Challenge by Nadia Ujovich.
The differentiator: it can stop work that's about to overspend — not just report it after the fact. Set a daily cap below current spend, and the next cron run is blocked by the budget:
/budget set global daily 0.001 writes the cap to budget.yaml; current spend ($0.0102) already exceeds it, so /budget re-renders at 1020% [daily] — a hard breach — and the next marketing cron run is blocked by the budget.
Hermes Agent runs autonomously — across sessions, platforms, and cron jobs — which
means it can keep spending even when you're not watching.
hermes-telemetry lives inside the runtime and enforces hard budget limits before
the next LLM call is made.
This plugin addresses NousResearch/hermes-agent#6642 — the open feature request for a first-class telemetry and budget subsystem for Hermes Agent.
Your Hermes session
↓ every API call
hermes-telemetry (native plugin)
→ tracks tokens + cost in real time
→ enforces budget limits mid-session
→ logs to SQLite with WAL mode
→ syncs OpenRouter pricing automatically
↓ if budget OK
LLM provider
Not a log reader. TokenTelemetry and similar tools read what already happened. hermes-telemetry hooks into the Hermes runtime and can stop what’s about to happen.
Design principle: observability is invisible to the model. Everything goes through hooks. The only user-facing surface is /stats and /budget.
- Screenshots
- What It Measures
- Installation
- Quick Start
- Standalone CLI
- Setup Wizard
- Dashboard (Web UI)
- Slash Commands
- Configuration
- Pricing Auto-Refresh
- Architecture
- Budget Enforcement
- Provider Probe: Verifying Your Provider
- Proof of Concept
- Comparison
- Running Tests
- Data Location
- Known Limitations
- Troubleshooting
- License
- Hermes Agent Challenge
A standalone HTML dashboard for users who prefer a visual interface over slash commands. Served locally, reads directly from the telemetry SQLite database.
Current dashboard home view with the tabbed layout (Home / Breakdown / Request / Tool / Error), header auto-refresh controls, budget windows rendered in the viewer's local timezone, and the refreshed recent-session tables.
| Metric | Source | Real or Estimated |
|---|---|---|
| Tokens in / out per API call | post_api_request.usage |
✅ Real (from provider) |
| Cache read / write tokens | post_api_request.usage |
✅ Real (from provider) |
| Reasoning tokens | post_api_request.usage |
✅ Real (from provider) |
| API call latency | post_api_request.api_duration |
✅ Real (ms) |
| Tool call latency & success/failure | post_tool_call |
✅ Real |
| Session / cron job wall time | started_at → ended_at |
✅ Real |
| Model & provider name | post_api_request |
✅ Real |
| Platform (cli / cron / telegram / …) | on_session_start.platform |
✅ Real |
| Cron job ID | Parsed from session_id |
✅ Real |
| Subagent invocation count | subagent_stop hook |
✅ Real (proxy) |
| Cost (USD) | Local pricing table × tokens | |
Tokens when provider returns usage=None |
Fallback approximation |
Cost is always an estimate computed from a locally-maintained pricing table. No external pricing API is called. When the provider returns no usage data, tokens are estimated from a pre-request approximation + response length and the row is flagged as estimated=1, so /stats and /budget show a ~ prefix and an “estimated data” percentage.
Hermes plugins are opt-in — you must both install and enable the plugin.
hermes plugins install nujovich/hermes-telemetry
hermes plugins enable hermes-telemetry
To use hermes-telemetry from the command line outside of sessions (one-time setup):
chmod +x ~/.hermes/plugins/hermes-telemetry/hermes-telemetry
ln -s ~/.hermes/plugins/hermes-telemetry/hermes-telemetry ~/.local/bin/hermes-telemetryFuture git pull updates the CLI automatically — no re-linking needed.
git clone https://github.com/nujovich/hermes-telemetry ~/.hermes/plugins/hermes-telemetry
hermes plugins enable hermes-telemetry
To use hermes-telemetry from the command line outside of sessions (one-time setup):
chmod +x ~/.hermes/plugins/hermes-telemetry/hermes-telemetry
ln -s ~/.hermes/plugins/hermes-telemetry/hermes-telemetry ~/.local/bin/hermes-telemetryFuture git pull updates the CLI automatically — no re-linking needed.
Important: restart the Hermes gateway after enabling:
hermes gateway restart
Note: Plugin changes only take effect after a gateway restart. The gateway loads the plugin registry at startup. If you enable a plugin and cron jobs don’t appear in
/stats cron week, this is the most likely cause.
- Install and enable the plugin (see above)
- Restart the gateway
- Run any session, then type
/statsto see captured data - Optionally configure
pricing.yamlandbudget.yaml(see below)
That’s it. The plugin captures data automatically — no agent action required.
Query telemetry data outside of an active Hermes session:
# Session summary
hermes-telemetry stats today
hermes-telemetry stats week
hermes-telemetry stats month
# Per-cron-job breakdown
hermes-telemetry stats cron
hermes-telemetry stats cron-week
# By provider / model
hermes-telemetry stats providers
hermes-telemetry stats models
# Budget status
hermes-telemetry budget
hermes-telemetry budget cron
# JSON output (for scripting)
hermes-telemetry stats today --json | jq ‘.cost_usd’
hermes-telemetry budget --json | jq ‘.global’All subcommands read from the same SQLite database as the in-session /stats and
/budget slash commands. The gateway does not need to be running.
hermes-telemetry includes a first-time setup wizard that runs automatically on first
plugin load when pricing.yaml and/or budget.yaml are missing. It can also be
triggered manually at any time with the /setup slash command.
On first load, if either config file is missing, the plugin auto-generates defaults:
- Pricing: fetches all models with fixed pricing from the OpenRouter API and merges them with ~30 built-in defaults (Anthropic, OpenAI, DeepSeek, Google, Meta, Nous). New prices take effect immediately — no gateway restart needed.
- Budget: writes a conservative global budget (
$5.00/day,$100.00/month) with an 80% soft warning and 100% hard cap.
Use /setup to check configuration status or reconfigure individual files.
/setup → show current status (which files exist)
/setup pricing auto → built-in defaults + fetch from OpenRouter API
/setup pricing minimal → built-in defaults only (~30 models, no network)
/setup pricing skip → skip (unrecognized models will record $0.00 cost)
/setup budget default → recommended global budget ($5/day, $100/month)
/setup budget custom → instructions for setting your own limits manually
/setup budget skip → no enforcement (costs still tracked)
| Option | Models | Network |
|---|---|---|
auto |
~30 built-in + all OpenRouter fixed-price models | Yes (OpenRouter API) |
minimal |
~30 built-in only | No |
skip |
None — models will record $0.00 cost |
No |
| Option | Behavior |
|---|---|
default |
Global: $5.00/day, $100.00/month. Soft warning at 80%, hard block at 100% |
custom |
Prints the /budget set commands for manual configuration |
skip |
Costs tracked but never enforced |
Setup skips files that already exist. To reconfigure:
# Reprice from scratch
rm ~/.hermes/telemetry/pricing.yaml
/setup pricing auto
# Reset budget
rm ~/.hermes/telemetry/budget.yaml
/setup budget defaultNote: Pricing changes take effect immediately without a gateway restart. Budget changes require a restart.
/stats → last 24h summary (sessions, tokens, cost, top tools)
/stats today → same as /stats
/stats week → last 7 days
/stats month → last 30 days
/stats cron → breakdown by cron_job_id (last 7 days)
/stats cron week → cron breakdown, last 7 days
/stats cron month → cron breakdown, last 30 days
/stats cron today → cron breakdown, last 24 hours
/stats providers → per-provider: real vs estimated calls + cost (last 24h)
/stats providers week → provider breakdown, last 7 days
/stats models → per-model breakdown within each provider (last 24h)
/stats models week → per-model breakdown, last 7 days
/stats raw [N] → last N raw run records (default 20, max 200)
Example output (/stats):
hermes-telemetry — last 24 h
============================================
Sessions : 14
Success rate : 92.9% (ok=13, failed=1)
API calls : 47
Tool calls : 183
Tokens in : 1,240,500
Tokens out : 87,300
Cost (est.) : $0.004822
Avg latency : 1.2s
Avg duration : 48.3s
Top tools:
Tool Calls Failures Avg ms
--------------------------------------------------------
read_file 92 0 12ms
terminal 51 3 340ms
write_file 28 0 18ms
Example output (/stats cron week):
hermes-telemetry — cron jobs (last 7 days)
========================================================================
Job ID Runs OK Fail Tok-in Tok-out Cost Avg dur
--------------------------------------------------------------------------
09dd0c24f29b 3 3 0 892,341 12,405 $0.314378 2.1m
d68c2728b513 1 1 0 445,119 8,200 $2.225595 4.7m
Example output (/stats providers):
hermes-telemetry — providers (last 24 h)
========================================================================
Provider Calls Real Est Est% Cost
-------------------------------------------------------------------
openrouter 66 66 0 0% $0.916782
Est% = share of calls where the provider returned no usage data
(tokens estimated locally).
If Est% > 0 for your main provider, budget hard-verdicts may be
degraded to soft under on_estimated.mode: warn_only.
Example output (/stats models):
hermes-telemetry — models (last 24 h)
================================================================================================
Provider Model Calls Real Est Cost
----------------------------------------------------------------------------------------------
openrouter owl-alpha 66 66 0 $0.000000
openrouter anthropic/claude-sonnet-4-6 42 42 0 $0.314378
openrouter anthropic/claude-opus-4-7 8 8 0 $2.225595
Rows are grouped by provider, then by calls (desc). A model showing $0.00 has no price entry
in pricing.yaml — run /setup pricing auto to refresh, or add it manually.
Breaks each provider's spend down to individual models. Rows are grouped by provider (ascending), then ordered by call count within each provider; the Model column is kept wide so dated model keys stay readable. Columns: Calls (total), Real (calls with provider-reported usage), Est (calls with locally estimated tokens), and Cost. A model showing $0.000000 has no price entry in pricing.yaml.
/budget → status of every scope (spent / limit / %)
/budget cron → per-cron-job budgets, with soft/hard flags
/budget set global daily 5.00 → set or raise a limit (persists + hot-reloads)
/budget set cron_job daily 1.00 → set default per-cron-job limit
/budget set sender daily 2.00 → set default per-sender limit
Example output (/budget):
hermes-telemetry — budget status
============================================================
global $ 0.1812 / $ 2.00 9% [daily]
Legend: (blank)=ok !=soft (≥80%) █=hard (≥100%) ~est=estimated data
Status flags:
| Flag | Meaning |
|---|---|
| (blank) | Within budget (< 80%) |
! |
Soft warning (≥ 80%) — notice injected into conversation |
█ |
Hard breach (≥ 100%) — tool calls blocked, cron jobs paused |
~est |
Verdict based partly on estimated (usage=None) data |
A standalone HTML dashboard for users who prefer a visual interface over slash commands. Zero dependencies — uses only Python stdlib.
The dashboard includes a header auto-refresh selector with Off / 5s / 10s / 20s / 1min options. The selected interval is saved in localStorage, and background refreshes keep the current page visible instead of blanking the whole UI.
- Home: summary cards, editable budget bars, daily cost, top tools, cron cost, provider distribution, cron jobs, and recent sessions
- Breakdown: token breakdown, provider cost breakdown, cache efficiency, model efficiency, model usage trends, model share delta, daily token table, and the investigation workspace
- Request: provider health/anomaly signals, request forensics, and request detail drawer
- Tool: tool analytics and tool failure heatmap
- Error: run-status groups, failed tools, recent incidents, and cron failure / waste center
- Investigation workspace: click-through drilldown by provider, model, day, status, platform, cron job, tool, and free-text search
- Drawers: session detail and request detail side drawers with click-back chips into filtered investigation views
- Viewer-local timestamps: rendered dates/times follow the browser's timezone; budget windows are computed for that viewer timezone too
- Soft-hidden deleted sessions: sessions marked deleted in Hermes metadata are hidden from session-facing tables by default, but aggregate historical totals remain intact
- Time range selector:
Last 24h / Last 7 days / Last 30 days / Last 90 days / All time
cd ~/.hermes/plugins/hermes-telemetry/dashboard
python3 serve.py # http://localhost:8765 (loopback only)
python3 serve.py --port 9090 # custom port, still loopback
python3 serve.py 9090 # positional port (back-compat)
Then open http://localhost:8765 in your browser.
The dashboard has no authentication — anyone who can reach the port sees
every captured token, cost, and tool-call detail. By default it binds to
127.0.0.1, which is unreachable from other machines.
If your Hermes server is headless (Pi, VPS, NAS) and you browse from a laptop, two options:
Recommended — SSH tunnel (no server-side change, leaves the safe default in place):
# Start the dashboard on the server as usual
ssh server "cd ~/.hermes/plugins/hermes-telemetry/dashboard && python3 serve.py &"
# Tunnel from your client
ssh -L 8765:localhost:8765 -N server &
# Browse on the client
open http://localhost:8765Trusted-LAN shortcut — --host 0.0.0.0:
python3 serve.py --host 0.0.0.0The script prints a warning when binding to any non-loopback interface. Only use this on a network where you trust every host. Do not expose to the public internet or to networks that include untrusted hosts — the dashboard ships without an auth layer by design (see CONTRIBUTING.md if you want to add one).
Configuration lives in ~/.hermes/telemetry/:
~/.hermes/telemetry/
├── telemetry.db ← SQLite database (WAL mode)
├── telemetry.log ← plugin log (errors / debug)
├── pricing.yaml ← optional pricing overrides
└── budget.yaml ← optional spend budgets
If these files don’t exist, the plugin still works — it just uses defaults (all models at $0.00, budgets disabled).
Override model prices in USD per 1 million tokens. Without overrides, unknown models log a one-time warning and record cost as $0.00.
Full format:
models:
# Free model
"openrouter/owl-alpha":
input: 0.00
output: 0.00
# Paid model with full cache/reasoning split
"openrouter/anthropic/claude-sonnet-4-6":
input: 3.00
output: 15.00
cache_read: 0.30
cache_write: 3.75
reasoning: 15.00
# Minimal override (cache prices derived from multipliers)
"openrouter/anthropic/claude-opus-4-7":
input: 5.00
output: 25.00
defaults:
cache_read_multiplier: 0.10 # cache_read = input * 0.10 if not specified
cache_write_multiplier: 1.25 # cache_write = input * 1.25 if not specifiedMatching rules (in order):
- Exact match (case-insensitive) against
models:keys in your YAML - Exact match against the built-in pricing table (~35 models)
- Longest-prefix match (e.g.
claude-sonnetmatchesclaude-sonnet-4-6-future) - Unknown →
$0.00with a one-time warning intelemetry.log
Prices are auto-fetched from the OpenRouter API and cached locally.
Provider-aware lookup. Each candidate is filtered by the call's provider so
an OpenRouter-sourced price is never applied to a call another provider served
(e.g. the OpenRouter Qwen rate must not cost a Nous Portal call, and a NIM call
of nvidia/... must not borrow OpenRouter's rate for the same id). Entries
auto-fetched from OpenRouter carry _source: openrouter and are skipped for
non-OpenRouter calls; built-in and hand-added entries (no _source) are
provider-neutral.
Subscription / flat-rate models. If a provider serves a model on a flat subscription or free tier (incremental per-token cost = $0), declare it under the provider's native model id so it stays distinct from a lookup miss:
models:
qwen3.7-plus: # Nous Portal's native id (not the OpenRouter qwen/ form)
input: 0.0
output: 0.0
_subscription: true # declared $0 — survives every OpenRouter refreshConfigure spend guardrails. No file → budgets disabled.
budgets:
global:
daily_usd: 2.00
monthly_usd: 50.00
per_cron_job:
default:
daily_usd: 1.00
overrides:
daily_email_report:
daily_usd: 3.00
per_sender:
default:
daily_usd: 2.00
overrides:
premium_user_123:
daily_usd: 5.00
thresholds:
soft_pct: 0.80 # warn at 80% of limit
hard_pct: 1.00 # enforce at 100%
on_estimated:
mode: enforce # warn_only | enforceScope resolution:
| Scope | How spend is calculated |
|---|---|
global |
All sessions + all cron jobs combined |
per_cron_job |
Sessions where cron_job_id matches (excludes subagent cost) |
per_sender |
Sessions from a specific sender (multi-user gateways) |
Window math: daily and monthly windows are computed in the user’s local timezone. A cron job that runs at 11:59 PM and another at 12:01 AM count against different daily windows.
The plugin can automatically fetch model pricing from OpenRouter’s public API, eliminating the need to manually maintain pricing.yaml for hundreds of models.
- Source: OpenRouter public API (
https://openrouter.ai/api/v1/models) — no auth required - Frequency: Once per 24 hours (tracked via sentinel file)
- Trigger: Automatically on plugin load (gateway startup), or manually via CLI
- Merge strategy:
- User overrides in
pricing.yamlare always preserved — manual entries take priority over auto-fetched ones - New models from the API are added automatically
- Previously auto-fetched models are updated when prices change
- Models are tagged with
_auto: trueand_source: openrouter— the_sourcetag is load-bearing: it drives the provider-aware guard above
- User overrides in
NVIDIA NIM (
build.nvidia.com) is supported out of the box: the Nemotron lineup ships as built-in seed prices, so NIM-served calls cost correctly even though NIM has no auto-refresh source. The seeds are immune to OpenRouter syncs, and a NIM call never borrows OpenRouter's rate for a colliding model id.nvidia/...:freepromo variants resolve to$0.00.
Some OpenRouter models have no fixed pricing (e.g. auto routing, experimental models). These are represented with negative prices in the API.
The plugin handles these safely:
- Prices are normalized to
$0.00(they don’t inflate cost calculations) - Flagged with
_estimated_price: trueinpricing.yaml - The budget engine detects when spend uses these models
Budget degradation logic:
| Condition | Effect |
|---|---|
on_estimated.mode: warn_only (default) |
If >0% of calls use estimated-price models, hard verdicts are degraded to soft — the user gets a warning but tools aren’t blocked |
on_estimated.mode: enforce |
Hard verdicts take effect regardless |
# Dry run — see what would change
python -m hermes_telemetry.pricing_refresh --check
# Apply changes
python -m hermes_telemetry.pricing_refresh
# Verbose output
python -m hermes_telemetry.pricing_refresh --verbose
Example output:
INFO OpenRouterSource: fetched 320 models
Updated 3 model(s):
~ stepfun/step-3.7-flash (openrouter)
input: 0.9999 → 0.2000
output: 9.9999 → 1.1500
+ anthropic/claude-opus-4.8 (openrouter)
input=5.0000 output=25.0000
⚠ Model(s) with estimated pricing: openrouter/auto, openrouter/bodybuilder, openrouter/pareto-code
Add new pricing providers by subclassing PricingSource:
from hermes_telemetry.pricing_refresh import PricingSource, register_source
class AnthropicSource(PricingSource):
name = "anthropic"
def fetch(self) -> dict[str, dict]:
# Fetch from Anthropic's pricing page or API
...
register_source(AnthropicSource)Sources are registered in pricing_refresh.py and fetched in parallel on each refresh cycle.
The plugin registers 10 hooks (out of 16 available in Hermes) plus 2 slash commands:
Hook Purpose
─────────────────────────────────────────────────────────────
on_session_start Create run row, extract cron_job_id
pre_api_request Stash approx_input_tokens for fallback
post_api_request PRIMARY: record tokens, cost, latency
post_tool_call Record tool name, success, duration
post_llm_call Refresh session end timestamp
subagent_stop Record delegate_task proxy on parent
on_session_end Set final status (ok/error/interrupted)
on_session_finalize Safety net: ensure run is closed
pre_llm_call Soft budget alerts + capture sender_id
pre_tool_call Hard budget enforcement (tool-gate)
Why post_api_request is the primary hook for tokens: The Hermes conversation loop can make multiple API calls per turn (retries, reasoning models, tool calls). Only post_api_request carries the canonical usage dict with token counts and cost data. pre_llm_call fires once per turn with no token data. post_llm_call fires after the tool loop with no token data.
Cron job identification: There is no cron_job_id in any hook. The plugin extracts it from the session_id, which follows the format cron_{job_id}_{YYYYMMDD_HHMMSS} (confirmed in Hermes source). An anchored regex handles job IDs that contain underscores.
SQLite with WAL mode, per-thread connections, schema v3:
runs — one row per session (CLI session or cron job execution):
| Column | Description |
|---|---|
session_id |
Primary key ({YYYYMMDD_HHMMSS}_{uuid6} for CLI, cron_{job_id}_{ts} for cron) |
platform |
cli, cron, telegram, discord, etc. |
cron_job_id |
Extracted from session_id when platform=cron |
model |
Model name (updated from last API call) |
provider |
Provider name (e.g. openrouter, anthropic) |
started_at / ended_at |
ISO-8601 UTC timestamps |
status |
running, ok, error, interrupted |
tokens_in / tokens_out |
Accumulated across all API calls in the session |
cost_usd |
Accumulated estimated cost |
duration_ms |
Wall time (ms) via julianday() |
api_calls / tool_calls |
Counters |
parent_session_id |
Reserved for future parent-child linking (not populated in v0.2) |
estimated_llm_calls |
Count of calls where provider returned usage=None |
sender_id |
For per-sender budgets (set via pre_llm_call) |
llm_calls — one row per individual API call:
All of runs token/cost columns, plus cache_read_tokens, cache_write_tokens, reasoning_tokens, estimated (boolean).
tool_calls — one row per tool execution:
session_id, ts, tool_name, ok (boolean), latency_ms.
budget_alerts — anti-spam ledger:
scope, scope_id, window, period_key, level, fired_at, spent_usd, limit_usd. Unique constraint prevents duplicate alerts.
Cron jobs run in a ThreadPoolExecutor (Hermes cron/scheduler.py). Multiple jobs can write to the DB simultaneously from different threads.
Design: per-thread SQLite connections via threading.local(). Each thread opens its own connection to the same WAL-mode DB file. A serializable _schema_lock protects DDL migrations on first connect (WAL mode switch requires a brief lock that busy_timeout alone doesn’t handle).
busy_timeout=5000 ensures write collisions retry for 5 seconds before raising. synchronous=NORMAL balances durability with write performance (safe for WAL mode).
See the budget enforcement demo at the top of this README for an end-to-end walkthrough.
Every time the agent is about to do work, the plugin checks:
pre_llm_call(fires once per turn): evaluates all applicable budget scopes. If any has asoftorhardverdict that hasn’t been alerted yet this window, injects a one-time notice into the conversation context (anti-spam viabudget_alertstable). Capturessender_id.pre_tool_call(fires before every tool): re-evaluates budgets. If any scope is inhardbreach, returns{"action":"block","message":...}which aborts the tool call.- For cron jobs with
hardbreach: additionally callscron.jobs.pause_jobto pause future runs.
Hermes does not expose a way to abort an in-flight model call from a plugin. pre_llm_call / pre_api_request returns can’t cancel a call. So enforcement is honest about its reach:
| Level | Trigger | Effect | Repeat? |
|---|---|---|---|
Soft (≥ soft_pct) |
Spend reaches 80% of limit (configurable) | One-time notice injected into conversation | Once per window per scope |
Hard (≥ hard_pct) |
Spend reaches 100% of limit | Every subsequent tool call is blocked | Every tool call until window resets |
| Cron pause | Any hard cron_job verdict |
Job is paused for future runs | Once per window per scope |
The model response already in flight still completes and is billed. What’s prevented is further tool-driven work.
When the provider returns usage=None, the plugin estimates tokens and flags the row as estimated=1. Since these estimates may be inaccurate, the budget engine offers a safety valve:
on_estimated.mode: warn_only (default): If a hard verdict rests partly on estimated rows, it is degraded to soft — the user gets a warning but tools aren’t blocked. Rationale: a budget built on estimates shouldn’t hard-stop work.
on_estimated.mode: enforce: Hard verdicts take effect regardless of estimate quality. Use this when you trust your provider’s usage data (Est% = 0) or when estimates are acceptable.
The /stats providers command shows the Est% column so you can see at a glance whether your provider returns real usage data.
Estimated-price models: Some models (e.g. OpenRouter auto routing) have no fixed pricing. These are flagged with _estimated_price: true in pricing.yaml and normalized to $0.00. If >0% of calls use these models, budget hard-verdicts are also degraded to soft under warn_only mode. See Pricing Auto-Refresh for details.
Run this once after enabling the plugin:
- Run one short session (any minimal task works)
- Execute
/stats providers - Look at the
Est%column for your provider:
0%→ provider returns real usage data. Budget verdicts are based on real numbers. Seton_estimated.mode: enforcefor strict enforcement. ✅> 0%→ provider omits usage in some responses. Those calls are estimated and flagged. Budget hard-verdicts will be degraded to soft underwarn_only. Thetelemetry.logwill have a one-time WARNING per provider.⚠️
The following PoC was executed live to validate the plugin end-to-end.
- Hermes gateway running on Linux (WSL), model
openrouter/owl-alpha(free tier) - Plugin: hermes-telemetry v0.2.0, loaded in gateway process
- DB:
/home/nujovich/.hermes/telemetry/telemetry.db(schema v3, WAL mode) - 6 cron jobs configured, 2 used for this PoC
Added models to ~/.hermes/telemetry/pricing.yaml:
models:
"openrouter/owl-alpha":
input: 0.00
output: 0.00
"openrouter/anthropic/claude-sonnet-4-6":
input: 3.00
output: 15.00
cache_read: 0.30
cache_write: 3.75
"openrouter/anthropic/claude-opus-4-7":
input: 5.00
output: 25.00
cache_read: 0.50
cache_write: 6.25Set on_estimated.mode: enforce for deterministic enforcement.
Step 1 — Trigger a hard breach:
- Budget:
global.daily_usd: 0.001($0.001/day) - Ran MCP Lead Gen job (model:
claude-sonnet-4-6, ~$3/$15 per 1M) - Result: job spent $0.1812 on first run → 18,120% of daily limit → █ hard breach → job auto-paused
█ global $0.1812 / $0.00 18120% [daily]
↑ (0.001 rounded to 0.00 in display)
Step 2 — Raise budget and resume:
/budget set global daily 2.00
Result after /budget set:
global $0.1812 / $2.00 9% [daily]
Step 3 — Verify job runs normally:
- MCP Lead Gen re-ran successfully under the $2.00 daily budget
- Second run confirmed:
state: scheduled,paused_at: null
| Job | Model | Price (input/output) |
|---|---|---|
| MCP Lead Gen | claude-sonnet-4-6 |
$3.00 / $15.00 per 1M |
| Marketing Highlights | claude-opus-4-7 |
$5.00 / $25.00 per 1M |
| Base sessions (CLI) | owl-alpha |
$0.00 / $0.00 (free) |
Results from SQLite (/stats after all runs):
- CLI sessions (owl-alpha, free): ~1M tokens in → $0.00
- MCP Lead Gen (claude-sonnet-4-6): ~892K tokens in → $0.314
- Marketing Highlights (claude-opus-4-7): ~445K tokens in → $2.23 (opus is ~5-8x more expensive per token)
| Component | Status |
|---|---|
| Token capture from provider | ✅ Real usage (estimated=0) |
| Cost estimation with pricing table | ✅ Accurate to pricing YAML |
| Cron job session tracking | ✅ Captured via session_id regex |
| Budget soft alerts | ✅ One-time context injection |
| Budget hard enforcement | ✅ Paused job at $0.001/day |
Budget hot-reload via /budget set |
✅ Cache cleared, new limit active |
| Multi-model cost comparison | ✅ Sonnet vs Opus vs Free |
| Pricing auto-refresh (OpenRouter API) | ✅ 320 models fetched, manual overrides preserved |
| Estimated-price model handling | ✅ Negative prices → $0.00, budget degradation |
| Dashboard (HTML, auto-refresh 30s) | ✅ Charts, tables, budget bar, provider distribution |
| 233 tests pass | ✅ |
| hermes-telemetry | TokenTelemetry | Martin Loop | |
|---|---|---|---|
| Hermes-native | ✅ Native plugin | ❌ Reads external logs | ❌ No Hermes support |
| Budget enforcement | ✅ Stops the run | ❌ Observe only | ✅ But not for Hermes |
| Real-time | ✅ Pre-call | ❌ Post-hoc | ✅ Pre-attempt |
| Requires Hermes | ✅ Hermes only | Any agent | Claude Code / Codex |
| Local dashboard | ✅ | ✅ (more complete) | ❌ |
| Open source | ✅ MIT | ✅ MIT | ✅ MIT |
When to use TokenTelemetry instead: if you need a multi-agent dashboard (Claude Code + Codex + Hermes in one place), TokenTelemetry is the right choice. hermes-telemetry is purpose-built for Hermes operators who need budget enforcement, not just visibility.
cd hermes-telemetry
pip install pytest pyyaml
pytest tests/ -v
Test suite (233 tests):
| File | Tests | Coverage |
|---|---|---|
test_pricing.py |
50 | Cache/reasoning split, no double-counting of prompt_tokens, YAML overrides, prefix matching, provider-aware source guard, NIM seeds, subscription tag, unknown model handling |
test_telemetry_cli.py |
32 | CLI subcommands (stats/budget), all window variants, text + --json output, entry point smoke test |
test_db.py |
29 | Schema v1→v4 migrations, CRUD, aggregations, concurrent WAL writes (10 threads × 5 writes) |
test_setup.py |
21 | First-time setup wizard, pricing/budget file generation, interactive + non-interactive paths |
test_dashboard.py |
21 | HTML dashboard rendering, auto-refresh, chart data endpoints, viewer-timezone budget windows |
test_budget.py |
20 | ok/soft/hard verdicts, estimated-to-soft degradation, anti-spam ledger, cron pause, per-scope routing, /budget set hot-reload |
test_stats_providers.py |
14 | Real vs estimated per provider, /stats providers output format, Nous warning dedup |
test_pricing_refresh.py |
14 | Auto-refresh from OpenRouter API, change detection, manual override preservation, subscription-model metadata |
test_init.py |
10 | Cron session ID regex, tool success/failure parsing |
test_subagent_reconciliation.py |
9 | Parent + child hook sequence, token reconciliation, no double-counting |
test_stats_models.py |
8 | Per-model breakdown, /stats models output format |
test_pricing_hot_reload.py |
3 | In-process cache invalidation on pricing update |
test_isolation.py |
2 | HERMES_HOME redirect, no writes to real ~/.hermes |
No live Hermes is required — all tests are self-contained with in-memory SQLite.
~/.hermes/telemetry/
├── telemetry.db ← SQLite (WAL mode, ~70KB base + growth)
├── telemetry.log ← Plugin log (errors, debug, one-time warnings)
├── pricing.yaml ← Your model price overrides
└── budget.yaml ← Your spend guardrails
The DB grows over time. For high-frequency cron jobs, consider periodic cleanup of old rows (not yet automated — see Known Limitations).
Enforcement gaps:
- No true mid-call abort.
pre_llm_call/pre_api_requestcannot cancel an in-flight model call. The response that’s already generating will complete and be billed. The tool-gate (pre_tool_call) stops subsequent work at the next tool boundary. - Runaway text-only sessions. A session that generates text without calling any tools never hits the tool-gate. If this becomes a problem, a pre-flight check in
on_session_startfor cron jobs could abort before the first LLM call.
Subagent attribution:
- Child agents (
delegate_task) run as their own sessions. Their tokens are captured independently and included in global totals. But there is no parent→child link in any hook — soper_cron_jobbudgets exclude subagent cost. Use theglobalbudget for a cap that captures delegated work.
Pricing refresh only for OpenRouter models:
pricing.yamlis updated with OpenRouter models via OpenRouter API, preserving those entered manually by the user.
DB retention:
telemetry.dbgrows without bound. No automatic purge of old rows. For >100K rows, consider manual cleanup or a retention policy (not yet implemented).
Gateway restart required:
- Enabling the plugin takes effect only after gateway restart. Cron runs that started before the restart won’t have telemetry.
/stats cron week shows “No cron runs in the last 7 days”:
The gateway loaded before the plugin was enabled. Restart the gateway:
hermes gateway restart
Then re-run a cron job.
/budget shows $0.00 as the limit:
The limit is cached in memory at gateway start. If you edited budget.yaml directly, the cache is stale. Use /budget set global daily <amount> to hot-reload, or restart the gateway.
Cost is $0.00 for all sessions:
Your model isn’t in the pricing table. Check telemetry.log for a one-time warning like:
hermes-telemetry: unknown model 'openrouter/some-model' — cost recorded as $0.00
Add it to pricing.yaml.
Provider Est% > 0:
Your provider returns usage=None for some/all calls. Tokens are estimated. Check /stats providers to see which providers are affected. If Est% is 100% for your main provider, all spend is estimated and budget hard-verdicts degrade to soft under warn_only mode.
Plugin not loading at all:
Check telemetry.log for errors. Common causes:
- Missing
pyyamlin the gateway’s venv:pip install pyyaml - Plugin not in
plugins.enabledin config.yaml - Syntax error in
pricing.yamlorbudget.yaml
MIT — see LICENSE.
This plugin was built for the Hermes Agent Challenge — a $1,000 competition to build the most useful Hermes Agent plugins and extensions.
🔗 Challenge Entry: hermes-telemetry on dev.to
🛠️ Built by: Nadia Ujovich
💡 Why this plugin: Every AI system needs observability and cost control. This plugin gives Hermes Agent users the visibility to optimize their workflows and the guardrails to prevent bill shock — essential for production deployments and automated cron jobs.
Made with ☕ for the Hermes Agent ecosystem






