Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ All notable changes to this project will be documented here. Format follows [Kee

## [Unreleased]

### Added
- **Price mode — emit computed dollar cost instead of token counts.** New `pricing_mode` config (`"tokens"` default | `"price"`), plus `markup`, `cost_metric_code` (default `llm_cost`), `pricing_ttl_seconds`, and `bedrock_default_region`. In price mode the SDK emits one `llm_cost` event per call carrying a top-level `precise_total_amount_cents` (cost in cents, after markup) for Lago's **dynamic charge model**, with a full per-field breakdown in `properties` (value in USD, base, markup, source, per-field tokens/unit_price/cost). Live unit prices come from public, no-auth sources: OpenRouter (`/api/v1/models`) for native anthropic/openai/mistral/gemini, and the AWS Bedrock Price List **Bulk** API for Bedrock. Prices are fetched + cached on the background queue thread (never blocking the customer's call); a missing price falls back to token events and calls `on_error` (never silently under-bills). Mode and markup are overridable per-call via `extra_lago={"mode": "price", "markup": 1.5}`. Money is computed with `Decimal` floored to 12 dp, identical to the JS implementation (cross-repo golden fixture). New `pricing.py` module + `PricingProvider`; default `pricing_mode="tokens"` keeps existing behavior unchanged.

### Fixed
- **Anthropic `messages.create(stream=True)` under-billed input tokens.** The stream wrapper read only top-level `usage`, which on a basic stream appears only on `message_delta` as `{output_tokens: N}` — the authoritative `input_tokens` / `cache_*` counts arrive nested under `message.usage` on the `message_start` event and were ignored, so input billed 0. The wrapper now merges usage from `message_start` (input/cache) and `message_delta` (cumulative output). Sync + async paths; regression tests use the realistic wire shape (delta carries no input echo).
- **Legacy `google-generativeai` SDK silently emitted no events.** The detector matched both the new `google-genai` and the deprecated `google-generativeai` SDKs, but the wrapper only instruments the unified `Client.models` / `.aio` surface — a legacy `GenerativeModel` routed through and wrapped nothing. `wrap()` now rejects legacy clients with a clear pointer to migrate to `google-genai`.
Expand Down
51 changes: 51 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,57 @@ For both OpenAI and Gemini, `cache_read`, `audio_input`, and `image_input` are *

OpenAI's Predicted Outputs tokens (`accepted_prediction_tokens`, `rejected_prediction_tokens`) are not surfaced — see the OpenAI adapter docstring for details on this intentional gap.

## Pricing mode — send dollar cost instead of tokens

By default the SDK emits **token counts** (`pricing_mode="tokens"`). You can instead have it
compute and emit the **dollar cost** of each call: `Σ(unit_price_per_token × tokens) × markup`.

```python
from lago_agent_sdk import LagoSDK, LagoConfig

sdk = LagoSDK(api_key="...", config=LagoConfig(
api_key="...",
default_subscription_id="sub_123",
pricing_mode="price", # "tokens" (default) | "price"
markup=1.2, # optional cost multiplier (1.2 = +20%)
))
client = sdk.wrap(anthropic_client)
# ... use the client normally ...
```

In **price mode** the SDK emits **one event per call** with code `llm_cost`. The event carries a
top-level `precise_total_amount_cents` (the total cost in cents, after markup) for Lago's
**dynamic charge model**, plus a breakdown in `properties`: `unit` (total tokens), `value` (USD
total), `base_cost` (pre-markup), `markup`, `price_source`, and per-field `*_tokens` /
`*_unit_price` / `*_cost`. Set up in Lago a `sum`-aggregation billable metric `llm_cost` on
`field_name: "unit"` and a **dynamic** charge on it — Lago sums each event's
`precise_total_amount_cents` into a single fee (`unit` is the displayed usage quantity). See
`testing/lago_setup_pricing_plan.py` for a script that creates this.

Per-call override via `extra_lago` (mode and markup, in addition to subscription/dimensions):

```python
client.messages.create(model="claude-...", messages=[...],
extra_lago={"mode": "price", "markup": 1.5})
```

**Live, public pricing sources (no API keys):**
- **OpenRouter** (`/api/v1/models`) for native `anthropic` / `openai` / `mistral` / `gemini`
clients — USD per token.
- **AWS Bedrock Price List Bulk API** (public) for Bedrock — parsed per region.

Prices are fetched and cached in the background (TTL `pricing_ttl_seconds`, default 1h); the
refresh runs on the SDK's background thread, so **your LLM call is never blocked on pricing**.

**Fallback (never under-bill):** if a price is unavailable (table not warm on the first call,
or the model isn't found in the source), the SDK **falls back to emitting token-count events**
and calls `on_error` so it's visible — it never silently drops the usage.

**Bedrock note:** AWS's public bulk data lists many models (Titan, Llama, Mistral, Cohere, and
older Claude) but, at time of writing, **not the current Claude 3.5/3.7/4 models**. Bedrock
calls for models absent from AWS's data fall back to token events. Native Anthropic clients are
priced via OpenRouter and unaffected.

## Error policy

The SDK never breaks your LLM call. If anything in instrumentation fails (adapter bug, Lago down, network error), the SDK swallows it, logs a warning, and your call returns normally.
Expand Down
10 changes: 9 additions & 1 deletion src/lago_agent_sdk/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
"""Lago Agent SDK — Python."""

from .canonical import CanonicalUsage
from .config import DEFAULT_METRIC_CODES, LagoConfig
from .config import DEFAULT_COST_METRIC_CODE, DEFAULT_METRIC_CODES, LagoConfig
from .exceptions import (
LagoApiError,
LagoConfigError,
LagoSDKError,
PricingUnavailableError,
UnknownClientError,
)
from .pricing import HttpPricingFetcher, ModelPrice, PricingProvider, compute_cost
from .sdk import LagoSDK

__all__ = [
Expand All @@ -17,7 +19,13 @@
"LagoApiError",
"LagoConfigError",
"LagoSDKError",
"PricingUnavailableError",
"UnknownClientError",
"DEFAULT_METRIC_CODES",
"DEFAULT_COST_METRIC_CODE",
"PricingProvider",
"HttpPricingFetcher",
"ModelPrice",
"compute_cost",
]
__version__ = "0.1.0"
30 changes: 29 additions & 1 deletion src/lago_agent_sdk/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from collections.abc import Callable
from dataclasses import dataclass, field
from typing import Any, Literal

DEFAULT_METRIC_CODES: dict[str, str] = {
"input": "llm_input_tokens",
Expand All @@ -19,6 +20,13 @@
"audio_output": "llm_audio_output_tokens",
}

# Metric code for the single per-call dollar-cost event emitted in price mode.
DEFAULT_COST_METRIC_CODE = "llm_cost"

# Pricing mode: emit raw token counts (default, backward-compatible) or a single
# computed dollar-cost event per call.
PricingMode = Literal["tokens", "price"]


def _mask_api_key(api_key: str) -> str:
"""Render an api key safe for logs/repr: keeps a 4-char tail for debuggability."""
Expand All @@ -42,6 +50,21 @@ class LagoConfig:
max_retry_seconds: float = 60.0
on_error: Callable[[Exception, str], None] | None = None

# --- pricing (price mode) ---
# Global default mode. "tokens" preserves the existing behavior exactly.
pricing_mode: PricingMode = "tokens"
# Multiplier applied to the computed cost (1.0 = no markup, 1.2 = +20%).
markup: float = 1.0
# Metric code for the single dollar-cost event emitted in price mode.
cost_metric_code: str = DEFAULT_COST_METRIC_CODE
# How long a fetched pricing table stays fresh before a background refresh.
pricing_ttl_seconds: float = 3600.0
# Region used for Bedrock pricing when the model id carries no region prefix.
bedrock_default_region: str = "us-east-1"
# Optional injected PricingProvider (or a stub) — primarily for tests/overrides.
# Typed Any to avoid a config→pricing import cycle.
pricing_provider: Any | None = field(default=None, repr=False)

def __repr__(self) -> str:
return (
f"LagoConfig(api_key={_mask_api_key(self.api_key)!r}, "
Expand All @@ -51,5 +74,10 @@ def __repr__(self) -> str:
f"max_batch_size={self.max_batch_size}, "
f"max_buffer_size={self.max_buffer_size}, "
f"request_timeout_seconds={self.request_timeout_seconds}, "
f"max_retry_seconds={self.max_retry_seconds})"
f"max_retry_seconds={self.max_retry_seconds}, "
f"pricing_mode={self.pricing_mode!r}, "
f"markup={self.markup}, "
f"cost_metric_code={self.cost_metric_code!r}, "
f"pricing_ttl_seconds={self.pricing_ttl_seconds}, "
f"bedrock_default_region={self.bedrock_default_region!r})"
)
11 changes: 11 additions & 0 deletions src/lago_agent_sdk/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,14 @@ def __init__(self, status: int, body: str) -> None:

class UnknownClientError(LagoConfigError):
"""`wrap()` received a client kind the SDK does not recognize."""


class PricingUnavailableError(LagoSDKError):
"""Price mode could not resolve a price (table not warm yet, or model not
matched). Surfaced via on_error; the SDK falls back to emitting token events."""

def __init__(self, provider: str, model: str, api: str) -> None:
super().__init__(f"no price for provider={provider!r} model={model!r} api={api!r}")
self.provider = provider
self.model = model
self.api = api
Loading
Loading