API Reference

This page provides comprehensive API documentation for all Karenina classes, methods, and functions.

!!! info "Autogenerated Documentation" This API reference is automatically generated from Python docstrings using mkdocstrings. All signatures, parameters, and return types are extracted from the source code.

Core Classes

Benchmark

The main class for creating and managing LLM benchmarks.

::: karenina.benchmark.Benchmark options: show_source: false heading_level: 4 show_root_heading: false members_order: source group_by_category: true

Question

Question data model with metadata support.

::: karenina.schemas.Question options: show_source: false heading_level: 4 show_root_heading: false

BaseAnswer

Base answer template model.

::: karenina.schemas.BaseAnswer options: show_source: false heading_level: 4 show_root_heading: false

Configuration

ModelConfig

LLM model configuration for answering and parsing.

::: karenina.schemas.ModelConfig options: show_source: false heading_level: 4 show_root_heading: false

VerificationConfig

Complete verification configuration with all options.

::: karenina.schemas.VerificationConfig options: show_source: false heading_level: 4 show_root_heading: false members_order: source

FewShotConfig

Few-shot prompting configuration.

::: karenina.schemas.FewShotConfig options: show_source: false heading_level: 4 show_root_heading: false

Results

VerificationResult

Complete verification result with all evaluation data.

::: karenina.schemas.VerificationResult options: show_source: false heading_level: 4 show_root_heading: false members_order: source

FinishedTemplate

Template that has been marked as finished and ready for verification.

::: karenina.schemas.FinishedTemplate options: show_source: false heading_level: 4 show_root_heading: false

Rubrics

Rubric

Rubric container with multiple evaluation traits.

::: karenina.schemas.Rubric options: show_source: false heading_level: 4 show_root_heading: false

LLMRubricTrait

LLM-based evaluation trait (score or binary).

::: karenina.schemas.LLMRubricTrait options: show_source: false heading_level: 4 show_root_heading: false

RegexTrait

Regex pattern-matching evaluation trait.

::: karenina.schemas.RegexTrait options: show_source: false heading_level: 4 show_root_heading: false

CallableTrait

Custom Python function-based evaluation trait.

::: karenina.schemas.CallableTrait options: show_source: false heading_level: 4 show_root_heading: false

MetricRubricTrait

Metric-based evaluation trait (precision, recall, F1, accuracy).

::: karenina.schemas.MetricRubricTrait options: show_source: false heading_level: 4 show_root_heading: false

RubricEvaluation

Rubric evaluation result for a single trait.

::: karenina.schemas.RubricEvaluation options: show_source: false heading_level: 4 show_root_heading: false

TaskEval

Classes for evaluating pre-logged agent workflow traces.

TaskEval

Main class for evaluating agent workflow traces against rubrics.

::: karenina.benchmark.task_eval.TaskEval options: show_source: false heading_level: 4 show_root_heading: false

TaskEvalResult

Result container for TaskEval evaluations.

::: karenina.benchmark.task_eval.TaskEvalResult options: show_source: false heading_level: 4 show_root_heading: false

StepEval

Step-specific evaluation result.

::: karenina.benchmark.task_eval.StepEval options: show_source: false heading_level: 4 show_root_heading: false

Question Extraction

Functions for extracting questions from files.

::: karenina.domain.questions.extractor options: show_source: false heading_level: 4 members: - extract_questions_from_file - QuestionExtractionConfig

Template Generation

Functions for generating answer templates.

::: karenina.domain.answers.generator options: show_source: false heading_level: 4 members: - generate_answer_template

Export Functions

Functions for exporting verification results.

::: karenina.benchmark.exporter options: show_source: false heading_level: 4 members: - export_verification_results_csv - export_verification_results_json

Checkpoint Format

JSON-LD checkpoint model for saving/loading benchmarks.

::: karenina.schemas.JsonLdCheckpoint options: show_source: false heading_level: 4 show_root_heading: false

Database Functions

Functions for database persistence.

::: karenina.storage options: show_source: false heading_level: 4 members: - save_benchmark - load_benchmark - DBConfig

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

Core Classes

Benchmark

Question

BaseAnswer

Configuration

ModelConfig

VerificationConfig

FewShotConfig

Results

VerificationResult

FinishedTemplate

Rubrics

Rubric

LLMRubricTrait

RegexTrait

CallableTrait

MetricRubricTrait

RubricEvaluation

TaskEval

TaskEval

TaskEvalResult

StepEval

Question Extraction

Template Generation

Export Functions

Checkpoint Format

Database Functions

FilesExpand file tree

api-reference.md

Latest commit

History

api-reference.md

File metadata and controls

API Reference

Core Classes

Benchmark

Question

BaseAnswer

Configuration

ModelConfig

VerificationConfig

FewShotConfig

Results

VerificationResult

FinishedTemplate

Rubrics

Rubric

LLMRubricTrait

RegexTrait

CallableTrait

MetricRubricTrait

RubricEvaluation

TaskEval

TaskEval

TaskEvalResult

StepEval

Question Extraction

Template Generation

Export Functions

Checkpoint Format

Database Functions