getlago · anassg-lago · Jun 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,9 @@ All notable changes to this project will be documented here. Format follows [Kee
 
 ## [Unreleased]
 
+### Added
+- **Price mode — emit computed dollar cost instead of token counts.** New `pricing_mode` config (`"tokens"` default | `"price"`), plus `markup`, `cost_metric_code` (default `llm_cost`), `pricing_ttl_seconds`, and `bedrock_default_region`. In price mode the SDK emits one `llm_cost` event per call carrying a top-level `precise_total_amount_cents` (cost in cents, after markup) for Lago's **dynamic charge model**, with a full per-field breakdown in `properties` (value in USD, base, markup, source, per-field tokens/unit_price/cost). Live unit prices come from public, no-auth sources: OpenRouter (`/api/v1/models`) for native anthropic/openai/mistral/gemini, and the AWS Bedrock Price List **Bulk** API for Bedrock. Prices are fetched + cached on the background queue thread (never blocking the customer's call); a missing price falls back to token events and calls `on_error` (never silently under-bills). Mode and markup are overridable per-call via `extra_lago={"mode": "price", "markup": 1.5}`. Money is computed with `Decimal` floored to 12 dp, identical to the JS implementation (cross-repo golden fixture). New `pricing.py` module + `PricingProvider`; default `pricing_mode="tokens"` keeps existing behavior unchanged.
+
 ### Fixed
 - **Anthropic `messages.create(stream=True)` under-billed input tokens.** The stream wrapper read only top-level `usage`, which on a basic stream appears only on `message_delta` as `{output_tokens: N}` — the authoritative `input_tokens` / `cache_*` counts arrive nested under `message.usage` on the `message_start` event and were ignored, so input billed 0. The wrapper now merges usage from `message_start` (input/cache) and `message_delta` (cumulative output). Sync + async paths; regression tests use the realistic wire shape (delta carries no input echo).
 - **Legacy `google-generativeai` SDK silently emitted no events.** The detector matched both the new `google-genai` and the deprecated `google-generativeai` SDKs, but the wrapper only instruments the unified `Client.models` / `.aio` surface — a legacy `GenerativeModel` routed through and wrapped nothing. `wrap()` now rejects legacy clients with a clear pointer to migrate to `google-genai`.

diff --git a/README.md b/README.md
@@ -189,6 +189,57 @@ For both OpenAI and Gemini, `cache_read`, `audio_input`, and `image_input` are *
 
 OpenAI's Predicted Outputs tokens (`accepted_prediction_tokens`, `rejected_prediction_tokens`) are not surfaced — see the OpenAI adapter docstring for details on this intentional gap.
 
+## Pricing mode — send dollar cost instead of tokens
+
+By default the SDK emits **token counts** (`pricing_mode="tokens"`). You can instead have it
+compute and emit the **dollar cost** of each call: `Σ(unit_price_per_token × tokens) × markup`.
+
+```python
+from lago_agent_sdk import LagoSDK, LagoConfig
+
+sdk = LagoSDK(api_key="...", config=LagoConfig(
+    api_key="...",
+    default_subscription_id="sub_123",
+    pricing_mode="price",     # "tokens" (default) | "price"
+    markup=1.2,               # optional cost multiplier (1.2 = +20%)
+))
+client = sdk.wrap(anthropic_client)
+# ... use the client normally ...
+```
+
+In **price mode** the SDK emits **one event per call** with code `llm_cost`. The event carries a
+top-level `precise_total_amount_cents` (the total cost in cents, after markup) for Lago's
+**dynamic charge model**, plus a breakdown in `properties`: `unit` (total tokens), `value` (USD
+total), `base_cost` (pre-markup), `markup`, `price_source`, and per-field `*_tokens` /
+`*_unit_price` / `*_cost`. Set up in Lago a `sum`-aggregation billable metric `llm_cost` on
+`field_name: "unit"` and a **dynamic** charge on it — Lago sums each event's
+`precise_total_amount_cents` into a single fee (`unit` is the displayed usage quantity). See
+`testing/lago_setup_pricing_plan.py` for a script that creates this.
+
+Per-call override via `extra_lago` (mode and markup, in addition to subscription/dimensions):
+
+```python
+client.messages.create(model="claude-...", messages=[...],
+                        extra_lago={"mode": "price", "markup": 1.5})
+```
+
+**Live, public pricing sources (no API keys):**
+- **OpenRouter** (`/api/v1/models`) for native `anthropic` / `openai` / `mistral` / `gemini`
+  clients — USD per token.
+- **AWS Bedrock Price List Bulk API** (public) for Bedrock — parsed per region.
+
+Prices are fetched and cached in the background (TTL `pricing_ttl_seconds`, default 1h); the
+refresh runs on the SDK's background thread, so **your LLM call is never blocked on pricing**.
+
+**Fallback (never under-bill):** if a price is unavailable (table not warm on the first call,
+or the model isn't found in the source), the SDK **falls back to emitting token-count events**
+and calls `on_error` so it's visible — it never silently drops the usage.
+
+**Bedrock note:** AWS's public bulk data lists many models (Titan, Llama, Mistral, Cohere, and
+older Claude) but, at time of writing, **not the current Claude 3.5/3.7/4 models**. Bedrock
+calls for models absent from AWS's data fall back to token events. Native Anthropic clients are
+priced via OpenRouter and unaffected.
+
 ## Error policy
 
 The SDK never breaks your LLM call. If anything in instrumentation fails (adapter bug, Lago down, network error), the SDK swallows it, logs a warning, and your call returns normally.

diff --git a/src/lago_agent_sdk/__init__.py b/src/lago_agent_sdk/__init__.py
@@ -1,13 +1,15 @@
 """Lago Agent SDK — Python."""
 
 from .canonical import CanonicalUsage
-from .config import DEFAULT_METRIC_CODES, LagoConfig
+from .config import DEFAULT_COST_METRIC_CODE, DEFAULT_METRIC_CODES, LagoConfig
 from .exceptions import (
     LagoApiError,
     LagoConfigError,
     LagoSDKError,
+    PricingUnavailableError,
     UnknownClientError,
 )
+from .pricing import HttpPricingFetcher, ModelPrice, PricingProvider, compute_cost
 from .sdk import LagoSDK
 
 __all__ = [
@@ -17,7 +19,13 @@
     "LagoApiError",
     "LagoConfigError",
     "LagoSDKError",
+    "PricingUnavailableError",
     "UnknownClientError",
     "DEFAULT_METRIC_CODES",
+    "DEFAULT_COST_METRIC_CODE",
+    "PricingProvider",
+    "HttpPricingFetcher",
+    "ModelPrice",
+    "compute_cost",
 ]
 __version__ = "0.1.0"
diff --git a/src/lago_agent_sdk/config.py b/src/lago_agent_sdk/config.py
@@ -4,6 +4,7 @@
 
 from collections.abc import Callable
 from dataclasses import dataclass, field
+from typing import Any, Literal
 
 DEFAULT_METRIC_CODES: dict[str, str] = {
     "input": "llm_input_tokens",
@@ -19,6 +20,13 @@
     "audio_output": "llm_audio_output_tokens",
 }
 
+# Metric code for the single per-call dollar-cost event emitted in price mode.
+DEFAULT_COST_METRIC_CODE = "llm_cost"
+
+# Pricing mode: emit raw token counts (default, backward-compatible) or a single
+# computed dollar-cost event per call.
+PricingMode = Literal["tokens", "price"]
+
 
 def _mask_api_key(api_key: str) -> str:
     """Render an api key safe for logs/repr: keeps a 4-char tail for debuggability."""
@@ -42,6 +50,21 @@ class LagoConfig:
     max_retry_seconds: float = 60.0
     on_error: Callable[[Exception, str], None] | None = None
 
+    # --- pricing (price mode) ---
+    # Global default mode. "tokens" preserves the existing behavior exactly.
+    pricing_mode: PricingMode = "tokens"
+    # Multiplier applied to the computed cost (1.0 = no markup, 1.2 = +20%).
+    markup: float = 1.0
+    # Metric code for the single dollar-cost event emitted in price mode.
+    cost_metric_code: str = DEFAULT_COST_METRIC_CODE
+    # How long a fetched pricing table stays fresh before a background refresh.
+    pricing_ttl_seconds: float = 3600.0
+    # Region used for Bedrock pricing when the model id carries no region prefix.
+    bedrock_default_region: str = "us-east-1"
+    # Optional injected PricingProvider (or a stub) — primarily for tests/overrides.
+    # Typed Any to avoid a config→pricing import cycle.
+    pricing_provider: Any | None = field(default=None, repr=False)
+
     def __repr__(self) -> str:
         return (
             f"LagoConfig(api_key={_mask_api_key(self.api_key)!r}, "
@@ -51,5 +74,10 @@ def __repr__(self) -> str:
             f"max_batch_size={self.max_batch_size}, "
             f"max_buffer_size={self.max_buffer_size}, "
             f"request_timeout_seconds={self.request_timeout_seconds}, "
-            f"max_retry_seconds={self.max_retry_seconds})"
+            f"max_retry_seconds={self.max_retry_seconds}, "
+            f"pricing_mode={self.pricing_mode!r}, "
+            f"markup={self.markup}, "
+            f"cost_metric_code={self.cost_metric_code!r}, "
+            f"pricing_ttl_seconds={self.pricing_ttl_seconds}, "
+            f"bedrock_default_region={self.bedrock_default_region!r})"
         )
diff --git a/src/lago_agent_sdk/exceptions.py b/src/lago_agent_sdk/exceptions.py
@@ -22,3 +22,14 @@ def __init__(self, status: int, body: str) -> None:
 
 class UnknownClientError(LagoConfigError):
     """`wrap()` received a client kind the SDK does not recognize."""
+
+
+class PricingUnavailableError(LagoSDKError):
+    """Price mode could not resolve a price (table not warm yet, or model not
+    matched). Surfaced via on_error; the SDK falls back to emitting token events."""
+
+    def __init__(self, provider: str, model: str, api: str) -> None:
+        super().__init__(f"no price for provider={provider!r} model={model!r} api={api!r}")
+        self.provider = provider
+        self.model = model
+        self.api = api