Most customers don't need this. Inside
botanu.event(...), the OTel auto-instrumentors for OpenAI, Anthropic, Vertex AI, and LangChain already produce GenAI semantic-convention spans withgen_ai.*attributes and run-context stamping. Reach fortrack_llm_callonly when the library you're calling isn't auto-instrumented (custom inference endpoint, self-hosted model server, proprietary SDK) or when you need to set content for eval manually.
from botanu.tracking.llm import track_llm_call
with track_llm_call(provider="openai", model="gpt-4") as tracker:
response = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
tracker.set_tokens(
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
)
tracker.set_request_id(response.id)| Attribute | Example | Description |
|---|---|---|
gen_ai.operation.name |
chat |
Type of operation |
gen_ai.provider.name |
openai |
Normalized provider name |
gen_ai.request.model |
gpt-4 |
Requested model |
gen_ai.response.model |
gpt-4-0613 |
Actual model used |
gen_ai.usage.input_tokens |
150 |
Input/prompt tokens |
gen_ai.usage.output_tokens |
200 |
Output/completion tokens |
gen_ai.response.id |
chatcmpl-... |
Provider request ID |
Record token usage from the response:
tracker.set_tokens(
input_tokens=150,
output_tokens=200,
cached_tokens=50, # For providers with caching
cache_read_tokens=50, # Anthropic-style cache read
cache_write_tokens=100, # Anthropic-style cache write
)Record provider and client request IDs for billing reconciliation:
tracker.set_request_id(
provider_request_id=response.id, # From provider response
client_request_id="my-client-123", # Your tracking ID
)When the response uses a different model than requested:
tracker.set_response_model("gpt-4-0613")Capture the prompt text and response text for downstream evaluation.
tracker.set_input_content(prompt_text)
tracker.set_output_content(response_text)Both methods are gated by BotanuConfig.content_capture_rate:
- Default rate is
0.0— both calls no-op. Nothing is written to the span. - Set the rate to
0.10–0.20in production (or1.0in a sandbox) to start capturing. The gate is a simplerandom.random() < ratecheck, so the decision is per-call. - Text is truncated at
max_chars(default 4096) before being stamped.
When capture fires, the SDK writes:
| Attribute | Source |
|---|---|
botanu.eval.input_content |
set_input_content(text) |
botanu.eval.output_content |
set_output_content(text) |
PII is scrubbed in-process by default before the attribute is written —
regex patterns for email, phone, SSN, credit card, IPs, JWTs, and common API
keys. Optional Presidio NER adds name/address/medical-term detection (install
with pip install botanu[pii-nlp]). Collector regex + evaluator Presidio
remain downstream as belt-and-suspenders. See
Content Capture for the full pipeline, opt-out knobs,
and the event-level auto-capture path that botanu.event(...) provides.
Record request parameters for analysis:
tracker.set_request_params(
temperature=0.7,
top_p=0.9,
max_tokens=1000,
stop_sequences=["END"],
frequency_penalty=0.5,
presence_penalty=0.3,
)Mark as a streaming request:
tracker.set_streaming(True)Mark as a cache hit (for semantic caching):
tracker.set_cache_hit(True)Track retry attempts:
tracker.set_attempt(2) # Second attemptRecord the stop reason:
tracker.set_finish_reason("stop") # or "length", "content_filter", etc.Record errors (automatically called on exceptions):
try:
response = await client.chat(...)
except openai.RateLimitError as e:
tracker.set_error(e)
raiseAdd custom attributes:
tracker.add_metadata(
prompt_version="v2.1",
experiment_id="exp-123",
)Use ModelOperation constants for the operation parameter:
from botanu.tracking.llm import track_llm_call, ModelOperation
# Chat completion
with track_llm_call(provider="openai", model="gpt-4", operation=ModelOperation.CHAT):
...
# Embeddings
with track_llm_call(provider="openai", model="text-embedding-3-small", operation=ModelOperation.EMBEDDINGS):
...
# Text completion (legacy)
with track_llm_call(provider="openai", model="davinci", operation=ModelOperation.TEXT_COMPLETION):
...Available operations:
| Constant | Value | Use Case |
|---|---|---|
CHAT |
chat |
Chat completions (default) |
TEXT_COMPLETION |
text_completion |
Legacy completions |
EMBEDDINGS |
embeddings |
Embedding generation |
GENERATE_CONTENT |
generate_content |
Generic content generation |
EXECUTE_TOOL |
execute_tool |
Tool/function execution |
CREATE_AGENT |
create_agent |
Agent creation |
INVOKE_AGENT |
invoke_agent |
Agent invocation |
RERANK |
rerank |
Reranking |
IMAGE_GENERATION |
image_generation |
Image generation |
SPEECH_TO_TEXT |
speech_to_text |
Transcription |
TEXT_TO_SPEECH |
text_to_speech |
Speech synthesis |
Provider names are automatically normalized:
| Input | Normalized |
|---|---|
openai, OpenAI |
openai |
azure_openai, azure-openai |
azure.openai |
anthropic, claude |
anthropic |
bedrock, aws_bedrock |
aws.bedrock |
vertex, vertexai, gemini |
gcp.vertex_ai |
cohere |
cohere |
mistral, mistralai |
mistral |
together, togetherai |
together |
groq |
groq |
Track tool calls triggered by LLMs:
from botanu.tracking.llm import track_tool_call
with track_tool_call(tool_name="search_database", tool_call_id="call_abc123") as tool:
results = await do_work(query)
tool.set_result(
success=True,
items_returned=len(results),
bytes_processed=1024,
)# Set execution result
tool.set_result(
success=True,
items_returned=10,
bytes_processed=2048,
)
# Set tool call ID from LLM response
tool.set_tool_call_id("call_abc123")
# Record error
tool.set_error(exception)
# Add custom metadata
tool.add_metadata(query_type="semantic")For cases where you can't use context managers:
from botanu.tracking.llm import set_llm_attributes
set_llm_attributes(
provider="openai",
model="gpt-4",
operation="chat",
input_tokens=150,
output_tokens=200,
streaming=True,
provider_request_id="chatcmpl-...",
)from botanu.tracking.llm import set_token_usage
set_token_usage(
input_tokens=150,
output_tokens=200,
cached_tokens=50,
)For wrapping existing client methods:
from botanu.tracking.llm import llm_instrumented
class MyOpenAIClient:
@llm_instrumented(provider="openai", tokens_from_response=True)
def chat(self, model: str, messages: list):
return openai.chat.completions.create(model=model, messages=messages)The SDK automatically records these metrics:
| Metric | Type | Description |
|---|---|---|
gen_ai.client.token.usage |
Histogram | Token counts by type |
gen_ai.client.operation.duration |
Histogram | Operation duration in seconds |
botanu.gen_ai.attempts |
Counter | Request attempts (including retries) |
from anthropic import AsyncAnthropic, RateLimitError
from openai import AsyncOpenAI
import botanu
from botanu.tracking.llm import track_llm_call
anthropic = AsyncAnthropic()
openai = AsyncOpenAI()
@botanu.event(
workflow="process-with-fallback",
event_id=lambda data: data["id"],
customer_id=lambda data: data["customer_id"],
)
async def process_with_fallback(data):
try:
with track_llm_call(provider="anthropic", model="claude-3-opus") as tracker:
tracker.set_attempt(1)
response = await anthropic.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
messages=[{"role": "user", "content": data["prompt"]}],
)
tracker.set_tokens(
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
)
botanu.emit_outcome(value_type="items_processed", value_amount=1)
return response.content[0].text
except RateLimitError:
with track_llm_call(provider="openai", model="gpt-4") as tracker:
tracker.set_attempt(2)
response = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": data["prompt"]}],
)
tracker.set_tokens(
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
)
botanu.emit_outcome(value_type="items_processed", value_amount=1)
return response.choices[0].message.content- Auto-Instrumentation - Automatic LLM tracking
- Data Tracking - Database and storage tracking
- Outcomes - Recording business outcomes