Skip to content

Latest commit

 

History

History
367 lines (278 loc) · 9.93 KB

File metadata and controls

367 lines (278 loc) · 9.93 KB

LLM Tracking

Most customers don't need this. Inside botanu.event(...), the OTel auto-instrumentors for OpenAI, Anthropic, Vertex AI, and LangChain already produce GenAI semantic-convention spans with gen_ai.* attributes and run-context stamping. Reach for track_llm_call only when the library you're calling isn't auto-instrumented (custom inference endpoint, self-hosted model server, proprietary SDK) or when you need to set content for eval manually.

track_llm_call

from botanu.tracking.llm import track_llm_call

with track_llm_call(provider="openai", model="gpt-4") as tracker:
    response = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
    tracker.set_tokens(
        input_tokens=response.usage.prompt_tokens,
        output_tokens=response.usage.completion_tokens,
    )
    tracker.set_request_id(response.id)

What Gets Recorded

Attribute Example Description
gen_ai.operation.name chat Type of operation
gen_ai.provider.name openai Normalized provider name
gen_ai.request.model gpt-4 Requested model
gen_ai.response.model gpt-4-0613 Actual model used
gen_ai.usage.input_tokens 150 Input/prompt tokens
gen_ai.usage.output_tokens 200 Output/completion tokens
gen_ai.response.id chatcmpl-... Provider request ID

LLMTracker Methods

set_tokens()

Record token usage from the response:

tracker.set_tokens(
    input_tokens=150,
    output_tokens=200,
    cached_tokens=50,        # For providers with caching
    cache_read_tokens=50,    # Anthropic-style cache read
    cache_write_tokens=100,  # Anthropic-style cache write
)

set_request_id()

Record provider and client request IDs for billing reconciliation:

tracker.set_request_id(
    provider_request_id=response.id,      # From provider response
    client_request_id="my-client-123",    # Your tracking ID
)

set_response_model()

When the response uses a different model than requested:

tracker.set_response_model("gpt-4-0613")

set_input_content() / set_output_content()

Capture the prompt text and response text for downstream evaluation.

tracker.set_input_content(prompt_text)
tracker.set_output_content(response_text)

Both methods are gated by BotanuConfig.content_capture_rate:

  • Default rate is 0.0 — both calls no-op. Nothing is written to the span.
  • Set the rate to 0.100.20 in production (or 1.0 in a sandbox) to start capturing. The gate is a simple random.random() < rate check, so the decision is per-call.
  • Text is truncated at max_chars (default 4096) before being stamped.

When capture fires, the SDK writes:

Attribute Source
botanu.eval.input_content set_input_content(text)
botanu.eval.output_content set_output_content(text)

PII is scrubbed in-process by default before the attribute is written — regex patterns for email, phone, SSN, credit card, IPs, JWTs, and common API keys. Optional Presidio NER adds name/address/medical-term detection (install with pip install botanu[pii-nlp]). Collector regex + evaluator Presidio remain downstream as belt-and-suspenders. See Content Capture for the full pipeline, opt-out knobs, and the event-level auto-capture path that botanu.event(...) provides.

set_request_params()

Record request parameters for analysis:

tracker.set_request_params(
    temperature=0.7,
    top_p=0.9,
    max_tokens=1000,
    stop_sequences=["END"],
    frequency_penalty=0.5,
    presence_penalty=0.3,
)

set_streaming()

Mark as a streaming request:

tracker.set_streaming(True)

set_cache_hit()

Mark as a cache hit (for semantic caching):

tracker.set_cache_hit(True)

set_attempt()

Track retry attempts:

tracker.set_attempt(2)  # Second attempt

set_finish_reason()

Record the stop reason:

tracker.set_finish_reason("stop")  # or "length", "content_filter", etc.

set_error()

Record errors (automatically called on exceptions):

try:
    response = await client.chat(...)
except openai.RateLimitError as e:
    tracker.set_error(e)
    raise

add_metadata()

Add custom attributes:

tracker.add_metadata(
    prompt_version="v2.1",
    experiment_id="exp-123",
)

Operation Types

Use ModelOperation constants for the operation parameter:

from botanu.tracking.llm import track_llm_call, ModelOperation

# Chat completion
with track_llm_call(provider="openai", model="gpt-4", operation=ModelOperation.CHAT):
    ...

# Embeddings
with track_llm_call(provider="openai", model="text-embedding-3-small", operation=ModelOperation.EMBEDDINGS):
    ...

# Text completion (legacy)
with track_llm_call(provider="openai", model="davinci", operation=ModelOperation.TEXT_COMPLETION):
    ...

Available operations:

Constant Value Use Case
CHAT chat Chat completions (default)
TEXT_COMPLETION text_completion Legacy completions
EMBEDDINGS embeddings Embedding generation
GENERATE_CONTENT generate_content Generic content generation
EXECUTE_TOOL execute_tool Tool/function execution
CREATE_AGENT create_agent Agent creation
INVOKE_AGENT invoke_agent Agent invocation
RERANK rerank Reranking
IMAGE_GENERATION image_generation Image generation
SPEECH_TO_TEXT speech_to_text Transcription
TEXT_TO_SPEECH text_to_speech Speech synthesis

Provider Normalization

Provider names are automatically normalized:

Input Normalized
openai, OpenAI openai
azure_openai, azure-openai azure.openai
anthropic, claude anthropic
bedrock, aws_bedrock aws.bedrock
vertex, vertexai, gemini gcp.vertex_ai
cohere cohere
mistral, mistralai mistral
together, togetherai together
groq groq

Tool/Function Tracking

Track tool calls triggered by LLMs:

from botanu.tracking.llm import track_tool_call

with track_tool_call(tool_name="search_database", tool_call_id="call_abc123") as tool:
    results = await do_work(query)
    tool.set_result(
        success=True,
        items_returned=len(results),
        bytes_processed=1024,
    )

ToolTracker Methods

# Set execution result
tool.set_result(
    success=True,
    items_returned=10,
    bytes_processed=2048,
)

# Set tool call ID from LLM response
tool.set_tool_call_id("call_abc123")

# Record error
tool.set_error(exception)

# Add custom metadata
tool.add_metadata(query_type="semantic")

Standalone Helpers

For cases where you can't use context managers:

set_llm_attributes()

from botanu.tracking.llm import set_llm_attributes

set_llm_attributes(
    provider="openai",
    model="gpt-4",
    operation="chat",
    input_tokens=150,
    output_tokens=200,
    streaming=True,
    provider_request_id="chatcmpl-...",
)

set_token_usage()

from botanu.tracking.llm import set_token_usage

set_token_usage(
    input_tokens=150,
    output_tokens=200,
    cached_tokens=50,
)

Decorator for Auto-Instrumentation

For wrapping existing client methods:

from botanu.tracking.llm import llm_instrumented

class MyOpenAIClient:
    @llm_instrumented(provider="openai", tokens_from_response=True)
    def chat(self, model: str, messages: list):
        return openai.chat.completions.create(model=model, messages=messages)

Metrics

The SDK automatically records these metrics:

Metric Type Description
gen_ai.client.token.usage Histogram Token counts by type
gen_ai.client.operation.duration Histogram Operation duration in seconds
botanu.gen_ai.attempts Counter Request attempts (including retries)

Example: multi-provider fallback

from anthropic import AsyncAnthropic, RateLimitError
from openai import AsyncOpenAI

import botanu
from botanu.tracking.llm import track_llm_call

anthropic = AsyncAnthropic()
openai = AsyncOpenAI()


@botanu.event(
    workflow="process-with-fallback",
    event_id=lambda data: data["id"],
    customer_id=lambda data: data["customer_id"],
)
async def process_with_fallback(data):
    try:
        with track_llm_call(provider="anthropic", model="claude-3-opus") as tracker:
            tracker.set_attempt(1)
            response = await anthropic.messages.create(
                model="claude-3-opus-20240229",
                max_tokens=1024,
                messages=[{"role": "user", "content": data["prompt"]}],
            )
            tracker.set_tokens(
                input_tokens=response.usage.input_tokens,
                output_tokens=response.usage.output_tokens,
            )
            botanu.emit_outcome(value_type="items_processed", value_amount=1)
            return response.content[0].text

    except RateLimitError:
        with track_llm_call(provider="openai", model="gpt-4") as tracker:
            tracker.set_attempt(2)
            response = await openai.chat.completions.create(
                model="gpt-4",
                messages=[{"role": "user", "content": data["prompt"]}],
            )
            tracker.set_tokens(
                input_tokens=response.usage.prompt_tokens,
                output_tokens=response.usage.completion_tokens,
            )
            botanu.emit_outcome(value_type="items_processed", value_amount=1)
            return response.choices[0].message.content

See Also