Skip to content

nujovich/hermes-telemetry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

114 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hermes-telemetry ☤

Observability + budget guardrails for Hermes Agent

Budget enforcement + observability for Hermes Agent. The only plugin that can stop a run before it overspends.

A comprehensive telemetry plugin that captures real usage data, enforces budget limits, and provides detailed cost analysis for AI agent operations. Built for the Hermes Agent Challenge by Nadia Ujovich.

The differentiator: it can stop work that's about to overspend — not just report it after the fact. Set a daily cap below current spend, and the next cron run is blocked by the budget:

Budget enforcement demo: a $0.001 daily global cap is set, current spend already exceeds it, and the next marketing cron run is blocked by the resulting hard breach

/budget set global daily 0.001 writes the cap to budget.yaml; current spend ($0.0102) already exceeds it, so /budget re-renders at 1020% [daily] — a hard breach — and the next marketing cron run is blocked by the budget.

Hermes Agent

License: MIT Tests: 233 passing Provider Support Challenge Entry


Hermes Agent runs autonomously — across sessions, platforms, and cron jobs — which means it can keep spending even when you're not watching.
hermes-telemetry lives inside the runtime and enforces hard budget limits before the next LLM call is made.

This plugin addresses NousResearch/hermes-agent#6642 — the open feature request for a first-class telemetry and budget subsystem for Hermes Agent.

Your Hermes session
  ↓ every API call
hermes-telemetry (native plugin)
  → tracks tokens + cost in real time
  → enforces budget limits mid-session
  → logs to SQLite with WAL mode
  → syncs OpenRouter pricing automatically
  ↓ if budget OK
LLM provider

Not a log reader. TokenTelemetry and similar tools read what already happened. hermes-telemetry hooks into the Hermes runtime and can stop what’s about to happen.


Design principle: observability is invisible to the model. Everything goes through hooks. The only user-facing surface is /stats and /budget.


Table of Contents


Screenshots

Dashboard (Web UI)

A standalone HTML dashboard for users who prefer a visual interface over slash commands. Served locally, reads directly from the telemetry SQLite database.

Dashboard overview

Current dashboard home view with the tabbed layout (Home / Breakdown / Request / Tool / Error), header auto-refresh controls, budget windows rendered in the viewer's local timezone, and the refreshed recent-session tables.

Slash Commands

/stats — Session analytics

Stats output

/budget — Current spending vs limits

Budget output

/stats cron week — Cron job cost breakdown

Cron output

/stats providers — Real vs estimated usage + estimated-price warning

Providers output


What It Measures

Metric Source Real or Estimated
Tokens in / out per API call post_api_request.usage ✅ Real (from provider)
Cache read / write tokens post_api_request.usage ✅ Real (from provider)
Reasoning tokens post_api_request.usage ✅ Real (from provider)
API call latency post_api_request.api_duration ✅ Real (ms)
Tool call latency & success/failure post_tool_call ✅ Real
Session / cron job wall time started_atended_at ✅ Real
Model & provider name post_api_request ✅ Real
Platform (cli / cron / telegram / …) on_session_start.platform ✅ Real
Cron job ID Parsed from session_id ✅ Real
Subagent invocation count subagent_stop hook ✅ Real (proxy)
Cost (USD) Local pricing table × tokens ⚠️ Estimated
Tokens when provider returns usage=None Fallback approximation ⚠️ Estimated, flagged

Cost is always an estimate computed from a locally-maintained pricing table. No external pricing API is called. When the provider returns no usage data, tokens are estimated from a pre-request approximation + response length and the row is flagged as estimated=1, so /stats and /budget show a ~ prefix and an “estimated data” percentage.


Installation

Hermes plugins are opt-in — you must both install and enable the plugin.

Option A: Install from GitHub

hermes plugins install nujovich/hermes-telemetry
hermes plugins enable hermes-telemetry

To use hermes-telemetry from the command line outside of sessions (one-time setup):

chmod +x ~/.hermes/plugins/hermes-telemetry/hermes-telemetry
ln -s ~/.hermes/plugins/hermes-telemetry/hermes-telemetry ~/.local/bin/hermes-telemetry

Future git pull updates the CLI automatically — no re-linking needed.

Option B: Manual install

git clone https://github.com/nujovich/hermes-telemetry ~/.hermes/plugins/hermes-telemetry
hermes plugins enable hermes-telemetry

To use hermes-telemetry from the command line outside of sessions (one-time setup):

chmod +x ~/.hermes/plugins/hermes-telemetry/hermes-telemetry
ln -s ~/.hermes/plugins/hermes-telemetry/hermes-telemetry ~/.local/bin/hermes-telemetry

Future git pull updates the CLI automatically — no re-linking needed.

Important: restart the Hermes gateway after enabling:

hermes gateway restart

Note: Plugin changes only take effect after a gateway restart. The gateway loads the plugin registry at startup. If you enable a plugin and cron jobs don’t appear in /stats cron week, this is the most likely cause.


Quick Start

  1. Install and enable the plugin (see above)
  2. Restart the gateway
  3. Run any session, then type /stats to see captured data
  4. Optionally configure pricing.yaml and budget.yaml (see below)

That’s it. The plugin captures data automatically — no agent action required.


Standalone CLI

Query telemetry data outside of an active Hermes session:

# Session summary
hermes-telemetry stats today
hermes-telemetry stats week
hermes-telemetry stats month

# Per-cron-job breakdown
hermes-telemetry stats cron
hermes-telemetry stats cron-week

# By provider / model
hermes-telemetry stats providers
hermes-telemetry stats models

# Budget status
hermes-telemetry budget
hermes-telemetry budget cron

# JSON output (for scripting)
hermes-telemetry stats today --json | jq ‘.cost_usd’
hermes-telemetry budget --json | jq ‘.global’

All subcommands read from the same SQLite database as the in-session /stats and /budget slash commands. The gateway does not need to be running.


Setup Wizard

hermes-telemetry includes a first-time setup wizard that runs automatically on first plugin load when pricing.yaml and/or budget.yaml are missing. It can also be triggered manually at any time with the /setup slash command.

Auto-setup (first load)

On first load, if either config file is missing, the plugin auto-generates defaults:

  • Pricing: fetches all models with fixed pricing from the OpenRouter API and merges them with ~30 built-in defaults (Anthropic, OpenAI, DeepSeek, Google, Meta, Nous). New prices take effect immediately — no gateway restart needed.
  • Budget: writes a conservative global budget ($5.00/day, $100.00/month) with an 80% soft warning and 100% hard cap.

/setup slash command

Use /setup to check configuration status or reconfigure individual files.

/setup                     → show current status (which files exist)
/setup pricing auto        → built-in defaults + fetch from OpenRouter API
/setup pricing minimal     → built-in defaults only (~30 models, no network)
/setup pricing skip        → skip (unrecognized models will record $0.00 cost)
/setup budget default      → recommended global budget ($5/day, $100/month)
/setup budget custom       → instructions for setting your own limits manually
/setup budget skip         → no enforcement (costs still tracked)

Pricing options

Option Models Network
auto ~30 built-in + all OpenRouter fixed-price models Yes (OpenRouter API)
minimal ~30 built-in only No
skip None — models will record $0.00 cost No

Budget options

Option Behavior
default Global: $5.00/day, $100.00/month. Soft warning at 80%, hard block at 100%
custom Prints the /budget set commands for manual configuration
skip Costs tracked but never enforced

Re-running setup

Setup skips files that already exist. To reconfigure:

# Reprice from scratch
rm ~/.hermes/telemetry/pricing.yaml
/setup pricing auto

# Reset budget
rm ~/.hermes/telemetry/budget.yaml
/setup budget default

Note: Pricing changes take effect immediately without a gateway restart. Budget changes require a restart.


Slash Commands

/stats

/stats                  → last 24h summary (sessions, tokens, cost, top tools)
/stats today            → same as /stats
/stats week             → last 7 days
/stats month            → last 30 days
/stats cron             → breakdown by cron_job_id (last 7 days)
/stats cron week        → cron breakdown, last 7 days
/stats cron month       → cron breakdown, last 30 days
/stats cron today       → cron breakdown, last 24 hours
/stats providers        → per-provider: real vs estimated calls + cost (last 24h)
/stats providers week   → provider breakdown, last 7 days
/stats models           → per-model breakdown within each provider (last 24h)
/stats models week      → per-model breakdown, last 7 days
/stats raw [N]          → last N raw run records (default 20, max 200)

Example output (/stats):

hermes-telemetry — last 24 h
============================================
  Sessions      : 14
  Success rate  : 92.9%  (ok=13, failed=1)
  API calls     : 47
  Tool calls    : 183
  Tokens in     : 1,240,500
  Tokens out    : 87,300
  Cost (est.)   : $0.004822
  Avg latency   : 1.2s
  Avg duration  : 48.3s

  Top tools:
  Tool                            Calls  Failures   Avg ms
  --------------------------------------------------------
  read_file                          92         0      12ms
  terminal                           51         3     340ms
  write_file                         28         0      18ms

Example output (/stats cron week):

hermes-telemetry — cron jobs (last 7 days)
========================================================================
  Job ID               Runs    OK  Fail     Tok-in    Tok-out         Cost   Avg dur
  --------------------------------------------------------------------------
  09dd0c24f29b            3     3     0   892,341    12,405    $0.314378     2.1m
  d68c2728b513            1     1     0   445,119     8,200    $2.225595     4.7m

Example output (/stats providers):

hermes-telemetry — providers (last 24 h)
========================================================================
  Provider                     Calls   Real   Est   Est%         Cost
  -------------------------------------------------------------------
  openrouter                      66     66      0     0%    $0.916782

  Est% = share of calls where the provider returned no usage data
  (tokens estimated locally).
  If Est% > 0 for your main provider, budget hard-verdicts may be
  degraded to soft under on_estimated.mode: warn_only.

Example output (/stats models):

hermes-telemetry — models (last 24 h)
================================================================================================
  Provider             Model                                           Calls   Real   Est         Cost
  ----------------------------------------------------------------------------------------------
  openrouter           owl-alpha                                          66     66     0    $0.000000
  openrouter           anthropic/claude-sonnet-4-6                        42     42     0    $0.314378
  openrouter           anthropic/claude-opus-4-7                           8      8     0    $2.225595

  Rows are grouped by provider, then by calls (desc). A model showing $0.00 has no price entry
  in pricing.yaml — run /setup pricing auto to refresh, or add it manually.

Breaks each provider's spend down to individual models. Rows are grouped by provider (ascending), then ordered by call count within each provider; the Model column is kept wide so dated model keys stay readable. Columns: Calls (total), Real (calls with provider-reported usage), Est (calls with locally estimated tokens), and Cost. A model showing $0.000000 has no price entry in pricing.yaml.

/budget

/budget                             → status of every scope (spent / limit / %)
/budget cron                        → per-cron-job budgets, with soft/hard flags
/budget set global daily 5.00       → set or raise a limit (persists + hot-reloads)
/budget set cron_job daily 1.00     → set default per-cron-job limit
/budget set sender daily 2.00       → set default per-sender limit

Example output (/budget):

hermes-telemetry — budget status
============================================================
  global                       $   0.1812 / $    2.00      9%  [daily]

  Legend:  (blank)=ok  !=soft (≥80%)  █=hard (≥100%)  ~est=estimated data

Status flags:

Flag Meaning
(blank) Within budget (< 80%)
! Soft warning (≥ 80%) — notice injected into conversation
Hard breach (≥ 100%) — tool calls blocked, cron jobs paused
~est Verdict based partly on estimated (usage=None) data

Dashboard (Web UI)

A standalone HTML dashboard for users who prefer a visual interface over slash commands. Zero dependencies — uses only Python stdlib.

Auto-Refresh

The dashboard includes a header auto-refresh selector with Off / 5s / 10s / 20s / 1min options. The selected interval is saved in localStorage, and background refreshes keep the current page visible instead of blanking the whole UI.

Features

  • Home: summary cards, editable budget bars, daily cost, top tools, cron cost, provider distribution, cron jobs, and recent sessions
  • Breakdown: token breakdown, provider cost breakdown, cache efficiency, model efficiency, model usage trends, model share delta, daily token table, and the investigation workspace
  • Request: provider health/anomaly signals, request forensics, and request detail drawer
  • Tool: tool analytics and tool failure heatmap
  • Error: run-status groups, failed tools, recent incidents, and cron failure / waste center
  • Investigation workspace: click-through drilldown by provider, model, day, status, platform, cron job, tool, and free-text search
  • Drawers: session detail and request detail side drawers with click-back chips into filtered investigation views
  • Viewer-local timestamps: rendered dates/times follow the browser's timezone; budget windows are computed for that viewer timezone too
  • Soft-hidden deleted sessions: sessions marked deleted in Hermes metadata are hidden from session-facing tables by default, but aggregate historical totals remain intact
  • Time range selector: Last 24h / Last 7 days / Last 30 days / Last 90 days / All time

Usage

cd ~/.hermes/plugins/hermes-telemetry/dashboard
python3 serve.py                  # http://localhost:8765 (loopback only)
python3 serve.py --port 9090      # custom port, still loopback
python3 serve.py 9090             # positional port (back-compat)

Then open http://localhost:8765 in your browser.

Accessing the dashboard from another host

The dashboard has no authentication — anyone who can reach the port sees every captured token, cost, and tool-call detail. By default it binds to 127.0.0.1, which is unreachable from other machines.

If your Hermes server is headless (Pi, VPS, NAS) and you browse from a laptop, two options:

Recommended — SSH tunnel (no server-side change, leaves the safe default in place):

# Start the dashboard on the server as usual
ssh server "cd ~/.hermes/plugins/hermes-telemetry/dashboard && python3 serve.py &"

# Tunnel from your client
ssh -L 8765:localhost:8765 -N server &

# Browse on the client
open http://localhost:8765

Trusted-LAN shortcut — --host 0.0.0.0:

python3 serve.py --host 0.0.0.0

The script prints a warning when binding to any non-loopback interface. Only use this on a network where you trust every host. Do not expose to the public internet or to networks that include untrusted hosts — the dashboard ships without an auth layer by design (see CONTRIBUTING.md if you want to add one).


Configuration

Configuration lives in ~/.hermes/telemetry/:

~/.hermes/telemetry/
├── telemetry.db      ← SQLite database (WAL mode)
├── telemetry.log     ← plugin log (errors / debug)
├── pricing.yaml      ← optional pricing overrides
└── budget.yaml       ← optional spend budgets

If these files don’t exist, the plugin still works — it just uses defaults (all models at $0.00, budgets disabled).

pricing.yaml

Override model prices in USD per 1 million tokens. Without overrides, unknown models log a one-time warning and record cost as $0.00.

Full format:

models:
  # Free model
  "openrouter/owl-alpha":
    input: 0.00
    output: 0.00

  # Paid model with full cache/reasoning split
  "openrouter/anthropic/claude-sonnet-4-6":
    input: 3.00
    output: 15.00
    cache_read: 0.30
    cache_write: 3.75
    reasoning: 15.00

  # Minimal override (cache prices derived from multipliers)
  "openrouter/anthropic/claude-opus-4-7":
    input: 5.00
    output: 25.00

defaults:
  cache_read_multiplier: 0.10   # cache_read = input * 0.10 if not specified
  cache_write_multiplier: 1.25  # cache_write = input * 1.25 if not specified

Matching rules (in order):

  1. Exact match (case-insensitive) against models: keys in your YAML
  2. Exact match against the built-in pricing table (~35 models)
  3. Longest-prefix match (e.g. claude-sonnet matches claude-sonnet-4-6-future)
  4. Unknown → $0.00 with a one-time warning in telemetry.log

Prices are auto-fetched from the OpenRouter API and cached locally.

Provider-aware lookup. Each candidate is filtered by the call's provider so an OpenRouter-sourced price is never applied to a call another provider served (e.g. the OpenRouter Qwen rate must not cost a Nous Portal call, and a NIM call of nvidia/... must not borrow OpenRouter's rate for the same id). Entries auto-fetched from OpenRouter carry _source: openrouter and are skipped for non-OpenRouter calls; built-in and hand-added entries (no _source) are provider-neutral.

Subscription / flat-rate models. If a provider serves a model on a flat subscription or free tier (incremental per-token cost = $0), declare it under the provider's native model id so it stays distinct from a lookup miss:

models:
  qwen3.7-plus:          # Nous Portal's native id (not the OpenRouter qwen/ form)
    input: 0.0
    output: 0.0
    _subscription: true  # declared $0 — survives every OpenRouter refresh

budget.yaml

Configure spend guardrails. No file → budgets disabled.

budgets:
  global:
    daily_usd: 2.00
    monthly_usd: 50.00
  per_cron_job:
    default:
      daily_usd: 1.00
    overrides:
      daily_email_report:
        daily_usd: 3.00
  per_sender:
    default:
      daily_usd: 2.00
    overrides:
      premium_user_123:
        daily_usd: 5.00

thresholds:
  soft_pct: 0.80    # warn at 80% of limit
  hard_pct: 1.00    # enforce at 100%

on_estimated:
  mode: enforce     # warn_only | enforce

Scope resolution:

Scope How spend is calculated
global All sessions + all cron jobs combined
per_cron_job Sessions where cron_job_id matches (excludes subagent cost)
per_sender Sessions from a specific sender (multi-user gateways)

Window math: daily and monthly windows are computed in the user’s local timezone. A cron job that runs at 11:59 PM and another at 12:01 AM count against different daily windows.


Pricing Auto-Refresh

The plugin can automatically fetch model pricing from OpenRouter’s public API, eliminating the need to manually maintain pricing.yaml for hundreds of models.

How It Works

  • Source: OpenRouter public API (https://openrouter.ai/api/v1/models) — no auth required
  • Frequency: Once per 24 hours (tracked via sentinel file)
  • Trigger: Automatically on plugin load (gateway startup), or manually via CLI
  • Merge strategy:
    • User overrides in pricing.yaml are always preserved — manual entries take priority over auto-fetched ones
    • New models from the API are added automatically
    • Previously auto-fetched models are updated when prices change
    • Models are tagged with _auto: true and _source: openrouter — the _source tag is load-bearing: it drives the provider-aware guard above

NVIDIA NIM (build.nvidia.com) is supported out of the box: the Nemotron lineup ships as built-in seed prices, so NIM-served calls cost correctly even though NIM has no auto-refresh source. The seeds are immune to OpenRouter syncs, and a NIM call never borrows OpenRouter's rate for a colliding model id. nvidia/...:free promo variants resolve to $0.00.

Estimated-Price Models

Some OpenRouter models have no fixed pricing (e.g. auto routing, experimental models). These are represented with negative prices in the API.

The plugin handles these safely:

  • Prices are normalized to $0.00 (they don’t inflate cost calculations)
  • Flagged with _estimated_price: true in pricing.yaml
  • The budget engine detects when spend uses these models

Budget degradation logic:

Condition Effect
on_estimated.mode: warn_only (default) If >0% of calls use estimated-price models, hard verdicts are degraded to soft — the user gets a warning but tools aren’t blocked
on_estimated.mode: enforce Hard verdicts take effect regardless

CLI Usage

# Dry run — see what would change
python -m hermes_telemetry.pricing_refresh --check

# Apply changes
python -m hermes_telemetry.pricing_refresh

# Verbose output
python -m hermes_telemetry.pricing_refresh --verbose

Example output:

INFO OpenRouterSource: fetched 320 models
Updated 3 model(s):

  ~ stepfun/step-3.7-flash  (openrouter)
      input: 0.9999 → 0.2000
      output: 9.9999 → 1.1500

  + anthropic/claude-opus-4.8  (openrouter)
      input=5.0000 output=25.0000

  ⚠  Model(s) with estimated pricing: openrouter/auto, openrouter/bodybuilder, openrouter/pareto-code

Extending with New Sources

Add new pricing providers by subclassing PricingSource:

from hermes_telemetry.pricing_refresh import PricingSource, register_source

class AnthropicSource(PricingSource):
    name = "anthropic"

    def fetch(self) -> dict[str, dict]:
        # Fetch from Anthropic's pricing page or API
        ...

register_source(AnthropicSource)

Sources are registered in pricing_refresh.py and fetched in parallel on each refresh cycle.


Architecture

Hook Plugin

The plugin registers 10 hooks (out of 16 available in Hermes) plus 2 slash commands:

Hook                      Purpose
─────────────────────────────────────────────────────────────
on_session_start          Create run row, extract cron_job_id
pre_api_request           Stash approx_input_tokens for fallback
post_api_request          PRIMARY: record tokens, cost, latency
post_tool_call            Record tool name, success, duration
post_llm_call             Refresh session end timestamp
subagent_stop             Record delegate_task proxy on parent
on_session_end            Set final status (ok/error/interrupted)
on_session_finalize       Safety net: ensure run is closed
pre_llm_call              Soft budget alerts + capture sender_id
pre_tool_call             Hard budget enforcement (tool-gate)

Why post_api_request is the primary hook for tokens: The Hermes conversation loop can make multiple API calls per turn (retries, reasoning models, tool calls). Only post_api_request carries the canonical usage dict with token counts and cost data. pre_llm_call fires once per turn with no token data. post_llm_call fires after the tool loop with no token data.

Cron job identification: There is no cron_job_id in any hook. The plugin extracts it from the session_id, which follows the format cron_{job_id}_{YYYYMMDD_HHMMSS} (confirmed in Hermes source). An anchored regex handles job IDs that contain underscores.

Database Schema

SQLite with WAL mode, per-thread connections, schema v3:

runs — one row per session (CLI session or cron job execution):

Column Description
session_id Primary key ({YYYYMMDD_HHMMSS}_{uuid6} for CLI, cron_{job_id}_{ts} for cron)
platform cli, cron, telegram, discord, etc.
cron_job_id Extracted from session_id when platform=cron
model Model name (updated from last API call)
provider Provider name (e.g. openrouter, anthropic)
started_at / ended_at ISO-8601 UTC timestamps
status running, ok, error, interrupted
tokens_in / tokens_out Accumulated across all API calls in the session
cost_usd Accumulated estimated cost
duration_ms Wall time (ms) via julianday()
api_calls / tool_calls Counters
parent_session_id Reserved for future parent-child linking (not populated in v0.2)
estimated_llm_calls Count of calls where provider returned usage=None
sender_id For per-sender budgets (set via pre_llm_call)

llm_calls — one row per individual API call:

All of runs token/cost columns, plus cache_read_tokens, cache_write_tokens, reasoning_tokens, estimated (boolean).

tool_calls — one row per tool execution:

session_id, ts, tool_name, ok (boolean), latency_ms.

budget_alerts — anti-spam ledger:

scope, scope_id, window, period_key, level, fired_at, spent_usd, limit_usd. Unique constraint prevents duplicate alerts.

Concurrency Model

Cron jobs run in a ThreadPoolExecutor (Hermes cron/scheduler.py). Multiple jobs can write to the DB simultaneously from different threads.

Design: per-thread SQLite connections via threading.local(). Each thread opens its own connection to the same WAL-mode DB file. A serializable _schema_lock protects DDL migrations on first connect (WAL mode switch requires a brief lock that busy_timeout alone doesn’t handle).

busy_timeout=5000 ensures write collisions retry for 5 seconds before raising. synchronous=NORMAL balances durability with write performance (safe for WAL mode).


Budget Enforcement

See the budget enforcement demo at the top of this README for an end-to-end walkthrough.

How It Works

Every time the agent is about to do work, the plugin checks:

  1. pre_llm_call (fires once per turn): evaluates all applicable budget scopes. If any has a soft or hard verdict that hasn’t been alerted yet this window, injects a one-time notice into the conversation context (anti-spam via budget_alerts table). Captures sender_id.
  2. pre_tool_call (fires before every tool): re-evaluates budgets. If any scope is in hard breach, returns {"action":"block","message":...} which aborts the tool call.
  3. For cron jobs with hard breach: additionally calls cron.jobs.pause_job to pause future runs.

Enforcement Levels

Hermes does not expose a way to abort an in-flight model call from a plugin. pre_llm_call / pre_api_request returns can’t cancel a call. So enforcement is honest about its reach:

Level Trigger Effect Repeat?
Soft (≥ soft_pct) Spend reaches 80% of limit (configurable) One-time notice injected into conversation Once per window per scope
Hard (≥ hard_pct) Spend reaches 100% of limit Every subsequent tool call is blocked Every tool call until window resets
Cron pause Any hard cron_job verdict Job is paused for future runs Once per window per scope

The model response already in flight still completes and is billed. What’s prevented is further tool-driven work.

Estimated Data and Budget Degradation

When the provider returns usage=None, the plugin estimates tokens and flags the row as estimated=1. Since these estimates may be inaccurate, the budget engine offers a safety valve:

on_estimated.mode: warn_only (default): If a hard verdict rests partly on estimated rows, it is degraded to soft — the user gets a warning but tools aren’t blocked. Rationale: a budget built on estimates shouldn’t hard-stop work.

on_estimated.mode: enforce: Hard verdicts take effect regardless of estimate quality. Use this when you trust your provider’s usage data (Est% = 0) or when estimates are acceptable.

The /stats providers command shows the Est% column so you can see at a glance whether your provider returns real usage data.

Estimated-price models: Some models (e.g. OpenRouter auto routing) have no fixed pricing. These are flagged with _estimated_price: true in pricing.yaml and normalized to $0.00. If >0% of calls use these models, budget hard-verdicts are also degraded to soft under warn_only mode. See Pricing Auto-Refresh for details.


Provider Probe: Verifying Your Provider Returns Real Usage

Run this once after enabling the plugin:

  1. Run one short session (any minimal task works)
  2. Execute /stats providers
  3. Look at the Est% column for your provider:
  • 0% → provider returns real usage data. Budget verdicts are based on real numbers. Set on_estimated.mode: enforce for strict enforcement. ✅
  • > 0% → provider omits usage in some responses. Those calls are estimated and flagged. Budget hard-verdicts will be degraded to soft under warn_only. The telemetry.log will have a one-time WARNING per provider. ⚠️

Proof of Concept

The following PoC was executed live to validate the plugin end-to-end.

Setup

  • Hermes gateway running on Linux (WSL), model openrouter/owl-alpha (free tier)
  • Plugin: hermes-telemetry v0.2.0, loaded in gateway process
  • DB: /home/nujovich/.hermes/telemetry/telemetry.db (schema v3, WAL mode)
  • 6 cron jobs configured, 2 used for this PoC

Pricing Capture

Added models to ~/.hermes/telemetry/pricing.yaml:

models:
  "openrouter/owl-alpha":
    input: 0.00
    output: 0.00
  "openrouter/anthropic/claude-sonnet-4-6":
    input: 3.00
    output: 15.00
    cache_read: 0.30
    cache_write: 3.75
  "openrouter/anthropic/claude-opus-4-7":
    input: 5.00
    output: 25.00
    cache_read: 0.50
    cache_write: 6.25

Set on_estimated.mode: enforce for deterministic enforcement.

Budget Enforcement Test

Step 1 — Trigger a hard breach:

  • Budget: global.daily_usd: 0.001 ($0.001/day)
  • Ran MCP Lead Gen job (model: claude-sonnet-4-6, ~$3/$15 per 1M)
  • Result: job spent $0.1812 on first run → 18,120% of daily limit → █ hard breach → job auto-paused
█ global    $0.1812 / $0.00    18120%  [daily]
                         ↑ (0.001 rounded to 0.00 in display)

Step 2 — Raise budget and resume:

/budget set global daily 2.00

Result after /budget set:

global    $0.1812 / $2.00    9%  [daily]

Step 3 — Verify job runs normally:

  • MCP Lead Gen re-ran successfully under the $2.00 daily budget
  • Second run confirmed: state: scheduled, paused_at: null

Cron Job Cost Comparison

Job Model Price (input/output)
MCP Lead Gen claude-sonnet-4-6 $3.00 / $15.00 per 1M
Marketing Highlights claude-opus-4-7 $5.00 / $25.00 per 1M
Base sessions (CLI) owl-alpha $0.00 / $0.00 (free)

Results from SQLite (/stats after all runs):

  • CLI sessions (owl-alpha, free): ~1M tokens in → $0.00
  • MCP Lead Gen (claude-sonnet-4-6): ~892K tokens in → $0.314
  • Marketing Highlights (claude-opus-4-7): ~445K tokens in → $2.23 (opus is ~5-8x more expensive per token)

Results Summary

Component Status
Token capture from provider ✅ Real usage (estimated=0)
Cost estimation with pricing table ✅ Accurate to pricing YAML
Cron job session tracking ✅ Captured via session_id regex
Budget soft alerts ✅ One-time context injection
Budget hard enforcement ✅ Paused job at $0.001/day
Budget hot-reload via /budget set ✅ Cache cleared, new limit active
Multi-model cost comparison ✅ Sonnet vs Opus vs Free
Pricing auto-refresh (OpenRouter API) ✅ 320 models fetched, manual overrides preserved
Estimated-price model handling ✅ Negative prices → $0.00, budget degradation
Dashboard (HTML, auto-refresh 30s) ✅ Charts, tables, budget bar, provider distribution
233 tests pass

Comparison

hermes-telemetry TokenTelemetry Martin Loop
Hermes-native ✅ Native plugin ❌ Reads external logs ❌ No Hermes support
Budget enforcement ✅ Stops the run ❌ Observe only ✅ But not for Hermes
Real-time ✅ Pre-call ❌ Post-hoc ✅ Pre-attempt
Requires Hermes ✅ Hermes only Any agent Claude Code / Codex
Local dashboard ✅ (more complete)
Open source ✅ MIT ✅ MIT ✅ MIT

When to use TokenTelemetry instead: if you need a multi-agent dashboard (Claude Code + Codex + Hermes in one place), TokenTelemetry is the right choice. hermes-telemetry is purpose-built for Hermes operators who need budget enforcement, not just visibility.


Running Tests

cd hermes-telemetry
pip install pytest pyyaml
pytest tests/ -v

Test suite (233 tests):

File Tests Coverage
test_pricing.py 50 Cache/reasoning split, no double-counting of prompt_tokens, YAML overrides, prefix matching, provider-aware source guard, NIM seeds, subscription tag, unknown model handling
test_telemetry_cli.py 32 CLI subcommands (stats/budget), all window variants, text + --json output, entry point smoke test
test_db.py 29 Schema v1→v4 migrations, CRUD, aggregations, concurrent WAL writes (10 threads × 5 writes)
test_setup.py 21 First-time setup wizard, pricing/budget file generation, interactive + non-interactive paths
test_dashboard.py 21 HTML dashboard rendering, auto-refresh, chart data endpoints, viewer-timezone budget windows
test_budget.py 20 ok/soft/hard verdicts, estimated-to-soft degradation, anti-spam ledger, cron pause, per-scope routing, /budget set hot-reload
test_stats_providers.py 14 Real vs estimated per provider, /stats providers output format, Nous warning dedup
test_pricing_refresh.py 14 Auto-refresh from OpenRouter API, change detection, manual override preservation, subscription-model metadata
test_init.py 10 Cron session ID regex, tool success/failure parsing
test_subagent_reconciliation.py 9 Parent + child hook sequence, token reconciliation, no double-counting
test_stats_models.py 8 Per-model breakdown, /stats models output format
test_pricing_hot_reload.py 3 In-process cache invalidation on pricing update
test_isolation.py 2 HERMES_HOME redirect, no writes to real ~/.hermes

No live Hermes is required — all tests are self-contained with in-memory SQLite.


Data Location

~/.hermes/telemetry/
├── telemetry.db        ← SQLite (WAL mode, ~70KB base + growth)
├── telemetry.log       ← Plugin log (errors, debug, one-time warnings)
├── pricing.yaml        ← Your model price overrides
└── budget.yaml         ← Your spend guardrails

The DB grows over time. For high-frequency cron jobs, consider periodic cleanup of old rows (not yet automated — see Known Limitations).


Known Limitations

Enforcement gaps:

  • No true mid-call abort. pre_llm_call / pre_api_request cannot cancel an in-flight model call. The response that’s already generating will complete and be billed. The tool-gate (pre_tool_call) stops subsequent work at the next tool boundary.
  • Runaway text-only sessions. A session that generates text without calling any tools never hits the tool-gate. If this becomes a problem, a pre-flight check in on_session_start for cron jobs could abort before the first LLM call.

Subagent attribution:

  • Child agents (delegate_task) run as their own sessions. Their tokens are captured independently and included in global totals. But there is no parent→child link in any hook — so per_cron_job budgets exclude subagent cost. Use the global budget for a cap that captures delegated work.

Pricing refresh only for OpenRouter models:

  • pricing.yaml is updated with OpenRouter models via OpenRouter API, preserving those entered manually by the user.

DB retention:

  • telemetry.db grows without bound. No automatic purge of old rows. For >100K rows, consider manual cleanup or a retention policy (not yet implemented).

Gateway restart required:

  • Enabling the plugin takes effect only after gateway restart. Cron runs that started before the restart won’t have telemetry.

Troubleshooting

/stats cron week shows “No cron runs in the last 7 days”:

The gateway loaded before the plugin was enabled. Restart the gateway:

hermes gateway restart

Then re-run a cron job.

/budget shows $0.00 as the limit:

The limit is cached in memory at gateway start. If you edited budget.yaml directly, the cache is stale. Use /budget set global daily <amount> to hot-reload, or restart the gateway.

Cost is $0.00 for all sessions:

Your model isn’t in the pricing table. Check telemetry.log for a one-time warning like:

hermes-telemetry: unknown model 'openrouter/some-model' — cost recorded as $0.00

Add it to pricing.yaml.

Provider Est% > 0:

Your provider returns usage=None for some/all calls. Tokens are estimated. Check /stats providers to see which providers are affected. If Est% is 100% for your main provider, all spend is estimated and budget hard-verdicts degrade to soft under warn_only mode.

Plugin not loading at all:

Check telemetry.log for errors. Common causes:

  • Missing pyyaml in the gateway’s venv: pip install pyyaml
  • Plugin not in plugins.enabled in config.yaml
  • Syntax error in pricing.yaml or budget.yaml

License

MIT — see LICENSE.


Hermes Agent Challenge

This plugin was built for the Hermes Agent Challenge — a $1,000 competition to build the most useful Hermes Agent plugins and extensions.

🔗 Challenge Entry: hermes-telemetry on dev.to

🛠️ Built by: Nadia Ujovich

💡 Why this plugin: Every AI system needs observability and cost control. This plugin gives Hermes Agent users the visibility to optimize their workflows and the guardrails to prevent bill shock — essential for production deployments and automated cron jobs.


Made with ☕ for the Hermes Agent ecosystem

About

Budget enforcement + observability plugin for Hermes Agent. Stops runaway costs before they happen.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors