PHP SDK for Opik - an LLM observability and evaluation platform.
NOTE: This is a community-maintained SDK, not an official Comet ML product. For official SDKs, see Python and TypeScript.
This table compares feature coverage between the official SDKs and this community PHP SDK.
| Category | Feature | Python | TypeScript | PHP | Notes |
|---|---|---|---|---|---|
| Tracing | Traces & Spans | âś… | âś… | âś… | Full support |
| Nested Spans | âś… | âś… | âś… | Full support | |
| Search (OQL) | âś… | âś… | âś… | Full support | |
| Span Types | âś… | âś… | âś… | Full support | |
| Usage Tracking | âś… | âś… | âś… | Full support | |
| Cost Calculation | âś… | âś… | âś… | User-provided pricing | |
@track Decorator |
✅ | ✅ | ❌ | PHP lacks decorators | |
| Feedback | Feedback Scores | âś… | âś… | âś… | Full support |
| Batch Feedback | âś… | âś… | âś… | Full support | |
| Threads | ✅ | ❌ | ✅ | Full support | |
| Datasets | CRUD Operations | âś… | âś… | âś… | Full support |
| Flexible Schema | âś… | âś… | âś… | Full support | |
| JSON Import/Export | âś… | âś… | âś… | Full support | |
| Experiments | Create & Manage | âś… | âś… | âś… | Full support |
| Log Items | âś… | âś… | âś… | Full support | |
| Prompts | Text Prompts | âś… | âś… | âś… | Full support |
| Chat Prompts | âś… | âś… | âś… | Full support | |
| Version History | âś… | âś… | âś… | Full support | |
| Attachments | Upload/Download | ✅ | ❌ | ✅ | Full support |
| Evaluation | Heuristic Metrics | âś… | âś… | âś… | ExactMatch, Contains, RegexMatch, IsJson, Equals, LevenshteinRatio |
| LLM Judge Metrics | ✅ | ✅ | ❌ | Not implemented | |
evaluate() |
âś… | âś… | âś… | Full support | |
| Integrations | OpenAI | ✅ | ✅ | ❌ | Not implemented |
| LangChain | ✅ | ✅ | ❌ | Not implemented | |
| Other Frameworks | ✅ | ✅ | ❌ | Not implemented | |
| Advanced | Guardrails | ✅ | ❌ | ❌ | Not implemented |
| Simulation | ✅ | ❌ | ❌ | Not implemented | |
| CLI Commands | ✅ | ❌ | ❌ | Not implemented |
| SDK | Core Features | Advanced Features | Overall |
|---|---|---|---|
| Python (Official) | 100% | 100% | 100% |
| TypeScript (Official) | ~90% | ~60% | ~80% |
| PHP (Community) | ~95% | ~25% | ~75% |
High Priority (Core Functionality):
- LLM Judge Metrics (AnswerRelevance, Hallucination, etc.)
Medium Priority (Integrations):
- OpenAI integration for automatic tracing
- Other LLM provider integrations
Low Priority (Advanced):
- Guardrails (PII detection, topic filtering)
- Simulation framework
- CLI commands
- Local recording for testing
Contributions are welcome! If you'd like to help implement missing features, please see the Development section.
Requirements: PHP 8.1+, Composer
composer require klipitkas/opik-php<?php
use Opik\OpikClient;
use Opik\Tracer\SpanType;
$client = new OpikClient();
// Create a trace
$trace = $client->trace(
name: 'chat-completion',
input: ['messages' => [['role' => 'user', 'content' => 'Hello!']]],
);
// Create an LLM span within the trace
$span = $trace->span(name: 'openai-call', type: SpanType::LLM);
$span->update(
output: ['response' => 'Hi there!'],
model: 'gpt-4',
provider: 'openai',
usage: new \Opik\Tracer\Usage(promptTokens: 10, completionTokens: 5, totalTokens: 15),
);
$span->end();
// End trace and flush
$trace->update(output: ['response' => 'Hi there!']);
$trace->end();
$client->flush();| Variable | Description | Required | Default |
|---|---|---|---|
OPIK_API_KEY |
API key | Yes (cloud) | - |
OPIK_WORKSPACE |
Workspace name | Yes (cloud) | - |
OPIK_PROJECT_NAME |
Project name | No | Default Project |
OPIK_URL_OVERRIDE |
Custom API URL | No | - |
OPIK_DEBUG |
Enable debug mode | No | false |
OPIK_ENABLE_COMPRESSION |
Enable gzip compression | No | true |
# Cloud (recommended)
export OPIK_API_KEY=your-api-key
export OPIK_WORKSPACE=your-workspace
export OPIK_PROJECT_NAME=your-project-name// From environment (recommended)
$client = new OpikClient();
// Explicit parameters
$client = new OpikClient(
apiKey: 'your-api-key',
workspace: 'your-workspace',
projectName: 'my-project',
);
// Local development
$client = new OpikClient(baseUrl: 'http://localhost:5173/api/');
// Verify credentials
if ($client->authCheck()) {
echo "Connected!";
}$trace = $client->trace(name: 'my-trace', input: ['query' => 'Hello']);
$span = $trace->span(name: 'process', type: SpanType::LLM);
$span->update(output: ['result' => 'Done']);
$span->end();
$trace->end();
$client->flush();$trace = $client->trace(name: 'multi-step');
$parent = $trace->span(name: 'parent');
$child1 = $parent->span(name: 'step-1', type: SpanType::TOOL);
$child1->end();
$child2 = $parent->span(name: 'step-2', type: SpanType::LLM);
$child2->end();
$parent->end();
$trace->end();// Search traces with OQL filter
$traces = $client->searchTraces(
projectName: 'my-project',
filter: 'name = "chat-completion"',
);
// Get specific trace/span
$trace = $client->getTraceContent('trace-id');
$span = $client->getSpanContent('span-id');| Type | Description |
|---|---|
SpanType::GENERAL |
General purpose span |
SpanType::LLM |
LLM API call |
SpanType::TOOL |
Tool/function call |
SpanType::GUARDRAIL |
Guardrail check |
Calculate and track LLM costs using your own pricing:
use Opik\Cost\CostCalculator;
use Opik\Tracer\Usage;
$usage = new Usage(promptTokens: 1000, completionTokens: 500);
// Using per-million token pricing (common format)
$cost = CostCalculator::calculateFromMillionPricing(
$usage,
inputCostPerMillion: 2.50, // $2.50 per 1M input tokens
outputCostPerMillion: 10.00, // $10.00 per 1M output tokens
);
// Or using per-token pricing
$cost = CostCalculator::calculate(
$usage,
inputCostPerToken: 0.0000025,
outputCostPerToken: 0.00001,
);
// Attach cost to span
$span->update(totalCost: $cost);$trace = $client->trace(name: 'scored-trace');
// Numeric score
$trace->logFeedbackScore(name: 'relevance', value: 0.95, reason: 'Good answer');
// Categorical score
$span = $trace->span(name: 'llm-call', type: SpanType::LLM);
$span->logFeedbackScore(name: 'sentiment', value: 1.0, categoryName: 'positive');use Opik\Feedback\FeedbackScore;
// For traces
$client->logTracesFeedbackScores([
FeedbackScore::forTrace('trace-1', 'quality', value: 0.9),
FeedbackScore::forTrace('trace-2', 'quality', value: 0.85, reason: 'Good'),
]);
// For spans
$client->logSpansFeedbackScores([
FeedbackScore::forSpan('span-1', 'accuracy', value: 0.95),
FeedbackScore::forSpan('span-2', 'accuracy', categoryName: 'high'),
]);
// Delete feedback scores
$client->deleteTraceFeedbackScore('trace-id', 'quality');
$client->deleteSpanFeedbackScore('span-id', 'accuracy');Group related traces into conversations:
use Opik\Feedback\FeedbackScore;
// Create traces in a thread
$trace1 = $client->trace(name: 'user-msg-1', threadId: 'conversation-123');
$trace1->end();
$trace2 = $client->trace(name: 'user-msg-2', threadId: 'conversation-123');
$trace2->end();
$client->flush();
// Close thread before scoring
$client->closeThread('conversation-123');
// Score the thread
$client->logThreadsFeedbackScores([
FeedbackScore::forThread('conversation-123', 'satisfaction', value: 0.95),
]);use Opik\Dataset\DatasetItem;
$dataset = $client->getOrCreateDataset(
name: 'eval-dataset',
description: 'Test cases',
);
// Standard schema
$dataset->insert([
new DatasetItem(
input: ['question' => 'What is PHP?'],
expectedOutput: ['answer' => 'A programming language'],
metadata: ['difficulty' => 'easy'],
),
]);
// Flexible schema
$dataset->insert([
new DatasetItem(data: [
'prompt' => 'Translate: Hello',
'expected' => 'Bonjour',
]),
]);// Get items
$items = $dataset->getItems(page: 1, size: 100);
foreach ($items as $item) {
$input = $item->getInput();
$output = $item->getExpectedOutput();
}
// Update/delete
$dataset->update($items);
$dataset->delete(['item-id-1', 'item-id-2']);
$dataset->clear(); // Delete all
// List/delete datasets
$datasets = $client->getDatasets();
$client->deleteDataset('dataset-name');// Import from JSON string
$json = '[{"input": "question 1", "output": "answer 1"}, {"input": "question 2", "output": "answer 2"}]';
$dataset->insertFromJson($json);
// Import with key mapping (rename keys)
$json = '[{"Question": "What is PHP?", "Expected Answer": "A language"}]';
$dataset->insertFromJson($json, keysMapping: [
'Question' => 'input',
'Expected Answer' => 'expected_output',
]);
// Import while ignoring certain keys
$dataset->insertFromJson($json, ignoreKeys: ['internal_id', 'debug_info']);
// Export to JSON string
$json = $dataset->toJson();
// Export with key mapping
$json = $dataset->toJson(keysMapping: [
'input' => 'Question',
'expected_output' => 'Expected Answer',
]);use Opik\Experiment\ExperimentItem;
// Create experiment
$experiment = $client->createExperiment(
name: 'gpt-4-eval',
datasetName: 'eval-dataset',
);
// Log results
$experiment->logItems([
new ExperimentItem(
datasetItemId: 'item-1',
traceId: 'trace-1',
output: ['result' => 'Answer'],
feedbackScores: [['name' => 'accuracy', 'value' => 0.9]],
),
]);
// Manage experiments
$experiment = $client->getExperimentById('experiment-id');
$client->updateExperiment(id: 'experiment-id', name: 'new-name');
$client->deleteExperiment('experiment-name');Opik supports two types of prompts: text prompts (simple string templates) and chat prompts (array of messages following OpenAI's chat format).
// Create a text prompt
$prompt = $client->createPrompt(
name: 'greeting',
template: 'Hello {{name}}, you asked: {{question}}',
);
// Get and format
$prompt = $client->getPrompt('greeting');
$text = $prompt->format(['name' => 'John', 'question' => 'How are you?']);
// Returns: "Hello John, you asked: How are you?"use Opik\Prompt\ChatMessage;
// Create a chat prompt with messages array
$prompt = $client->createPrompt(
name: 'assistant-prompt',
template: [
ChatMessage::system('You are a helpful assistant specializing in {{domain}}.'),
ChatMessage::user('{{question}}'),
],
);
// Format returns array of messages
$messages = $prompt->format(['domain' => 'physics', 'question' => 'What is gravity?']);
// Returns:
// [
// ['role' => 'system', 'content' => 'You are a helpful assistant specializing in physics.'],
// ['role' => 'user', 'content' => 'What is gravity?'],
// ]| Method | Description |
|---|---|
ChatMessage::system($content) |
Create a system message |
ChatMessage::user($content) |
Create a user message |
ChatMessage::assistant($content) |
Create an assistant message |
ChatMessage::tool($content) |
Create a tool message |
// Get version history
$history = $client->getPromptHistory('greeting');
// Get specific version
$version = $prompt->getVersion('commit-hash');
// Check prompt type
if ($version->isChat()) {
$messages = $version->format($variables);
} else {
$text = $version->format($variables);
}$client->deletePrompts(['prompt-id-1', 'prompt-id-2']);Upload files to traces or spans:
use Opik\Attachment\AttachmentEntityType;
$attachmentClient = $client->getAttachmentClient();
// Upload
$attachmentClient->uploadAttachment(
projectName: 'my-project',
entityType: AttachmentEntityType::TRACE,
entityId: $trace->getId(),
filePath: '/path/to/file.pdf',
);
// List
$attachments = $attachmentClient->getAttachmentList(
projectName: 'my-project',
entityType: AttachmentEntityType::TRACE,
entityId: $trace->getId(),
);
// Download
$content = $attachmentClient->downloadAttachment(
projectName: 'my-project',
entityType: AttachmentEntityType::TRACE,
entityId: $trace->getId(),
fileName: 'file.pdf',
mimeType: 'application/pdf',
);The SDK provides heuristic metrics for evaluating LLM outputs:
use Opik\Evaluation\Metrics\ExactMatch;
use Opik\Evaluation\Metrics\Contains;
use Opik\Evaluation\Metrics\RegexMatch;
use Opik\Evaluation\Metrics\IsJson;
// ExactMatch - checks for exact equality
$metric = new ExactMatch();
$result = $metric->score([
'output' => 'hello world',
'expected' => 'hello world',
]);
echo $result->value; // 1.0 (match) or 0.0 (no match)
// Contains - checks if output contains expected substring
$metric = new Contains(caseSensitive: false);
$result = $metric->score([
'output' => 'Hello World',
'expected' => 'hello',
]);
echo $result->value; // 1.0
// RegexMatch - checks if output matches a regex pattern
$metric = new RegexMatch();
$result = $metric->score([
'output' => 'Contact: [email protected]',
'pattern' => '/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/',
]);
echo $result->value; // 1.0
// IsJson - checks if output is valid JSON
$metric = new IsJson();
$result = $metric->score([
'output' => '{"key": "value"}',
]);
echo $result->value; // 1.0| Metric | Description |
|---|---|
ExactMatch |
Checks if output exactly equals expected (strict comparison) |
Contains |
Checks if output contains expected substring (supports case-insensitive) |
RegexMatch |
Checks if output matches a regex pattern |
IsJson |
Checks if output is valid JSON |
Run evaluations against datasets with automatic experiment tracking:
use Opik\Evaluation\Metrics\ExactMatch;
use Opik\Evaluation\Metrics\Contains;
// Get or create a dataset
$dataset = $client->getOrCreateDataset('qa-dataset');
$dataset->insert([
new DatasetItem(data: [
'input' => 'What is PHP?',
'expected' => 'programming language',
]),
new DatasetItem(data: [
'input' => 'What is Python?',
'expected' => 'programming language',
]),
]);
// Define your task function
$task = function (array $item): array {
// Your LLM call or processing logic here
$response = $llm->complete($item['input']);
return ['output' => $response];
};
// Run evaluation
$result = $client->evaluate(
dataset: $dataset,
task: $task,
scoringMetrics: [
new ExactMatch(),
new Contains(),
],
experimentName: 'my-evaluation',
);
// Access results
echo "Evaluated {$result->count()} items in {$result->durationSeconds}s\n";
echo "Average exact_match: {$result->getAverageScore('exact_match')}\n";
echo "Average contains: {$result->getAverageScore('contains')}\n";
// Get all average scores
$averages = $result->getAverageScores();
foreach ($averages as $metric => $score) {
echo "{$metric}: {$score}\n";
}The evaluate() function:
- Creates an experiment for tracking results
- Runs the task function on each dataset item
- Calculates scores using the provided metrics
- Logs feedback scores to traces
- Returns detailed results with averages
| Category | Method | Description |
|---|---|---|
| Tracing | trace(...) |
Create a trace |
span(...) |
Create a standalone span | |
searchTraces(...) |
Search traces with OQL | |
searchSpans(...) |
Search spans with OQL | |
getTraceContent(id) |
Get trace by ID | |
getSpanContent(id) |
Get span by ID | |
| Feedback | logTracesFeedbackScores(scores) |
Batch log trace scores |
logSpansFeedbackScores(scores) |
Batch log span scores | |
logThreadsFeedbackScores(scores) |
Batch log thread scores | |
deleteTraceFeedbackScore(id, name) |
Delete trace score | |
deleteSpanFeedbackScore(id, name) |
Delete span score | |
| Threads | closeThread(id) |
Close a thread |
closeThreads(ids) |
Close multiple threads | |
| Datasets | getDataset(name) |
Get dataset |
getDatasets() |
List datasets | |
createDataset(name) |
Create dataset | |
getOrCreateDataset(name) |
Get or create dataset | |
deleteDataset(name) |
Delete dataset | |
| Experiments | createExperiment(name, datasetName) |
Create experiment |
getExperiment(name) |
Get by name | |
getExperimentById(id) |
Get by ID | |
updateExperiment(id, ...) |
Update experiment | |
deleteExperiment(name) |
Delete experiment | |
| Prompts | createPrompt(name, template) |
Create text or chat prompt |
getPrompt(name) |
Get prompt | |
getPrompts() |
List prompts | |
getPromptHistory(name) |
Get versions | |
deletePrompts(ids) |
Delete prompts | |
| Attachments | getAttachmentClient() |
Get attachment client |
| Evaluation | evaluate(dataset, task, ...) |
Run evaluation with metrics |
| Utilities | authCheck() |
Verify credentials |
flush() |
Send pending data | |
getConfig() |
Get configuration | |
getProjectUrl() |
Get project URL |
| Method | Description |
|---|---|
span(name, type?, ...) |
Create child span |
update(output?, ...) |
Update trace data |
end() |
End the trace |
logFeedbackScore(name, value, ...) |
Log feedback score |
getId() |
Get trace ID |
| Method | Description |
|---|---|
span(name, type?, ...) |
Create child span |
update(output?, model?, usage?, ...) |
Update span data |
end() |
End the span |
logFeedbackScore(name, value, ...) |
Log feedback score |
getId() |
Get span ID |
# Install dependencies
composer install
# Run tests
composer test
# Run with coverage (requires pcov/xdebug)
composer test:coverage
# Static analysis
composer analyse
# Code formatting
composer format
composer format:checkMIT
Opik and Comet ML are trademarks of Comet ML, Inc. This project is not affiliated with, endorsed by, or sponsored by Comet ML, Inc.