This page provides comprehensive API documentation for all Karenina classes, methods, and functions.
!!! info "Autogenerated Documentation" This API reference is automatically generated from Python docstrings using mkdocstrings. All signatures, parameters, and return types are extracted from the source code.
The main class for creating and managing LLM benchmarks.
::: karenina.benchmark.Benchmark options: show_source: false heading_level: 4 show_root_heading: false members_order: source group_by_category: true
Question data model with metadata support.
::: karenina.schemas.Question options: show_source: false heading_level: 4 show_root_heading: false
Base answer template model.
::: karenina.schemas.BaseAnswer options: show_source: false heading_level: 4 show_root_heading: false
LLM model configuration for answering and parsing.
::: karenina.schemas.ModelConfig options: show_source: false heading_level: 4 show_root_heading: false
Complete verification configuration with all options.
::: karenina.schemas.VerificationConfig options: show_source: false heading_level: 4 show_root_heading: false members_order: source
Few-shot prompting configuration.
::: karenina.schemas.FewShotConfig options: show_source: false heading_level: 4 show_root_heading: false
Complete verification result with all evaluation data.
::: karenina.schemas.VerificationResult options: show_source: false heading_level: 4 show_root_heading: false members_order: source
Template that has been marked as finished and ready for verification.
::: karenina.schemas.FinishedTemplate options: show_source: false heading_level: 4 show_root_heading: false
Rubric container with multiple evaluation traits.
::: karenina.schemas.Rubric options: show_source: false heading_level: 4 show_root_heading: false
LLM-based evaluation trait (score or binary).
::: karenina.schemas.LLMRubricTrait options: show_source: false heading_level: 4 show_root_heading: false
Regex pattern-matching evaluation trait.
::: karenina.schemas.RegexTrait options: show_source: false heading_level: 4 show_root_heading: false
Custom Python function-based evaluation trait.
::: karenina.schemas.CallableTrait options: show_source: false heading_level: 4 show_root_heading: false
Metric-based evaluation trait (precision, recall, F1, accuracy).
::: karenina.schemas.MetricRubricTrait options: show_source: false heading_level: 4 show_root_heading: false
Rubric evaluation result for a single trait.
::: karenina.schemas.RubricEvaluation options: show_source: false heading_level: 4 show_root_heading: false
Classes for evaluating pre-logged agent workflow traces.
Main class for evaluating agent workflow traces against rubrics.
::: karenina.benchmark.task_eval.TaskEval options: show_source: false heading_level: 4 show_root_heading: false
Result container for TaskEval evaluations.
::: karenina.benchmark.task_eval.TaskEvalResult options: show_source: false heading_level: 4 show_root_heading: false
Step-specific evaluation result.
::: karenina.benchmark.task_eval.StepEval options: show_source: false heading_level: 4 show_root_heading: false
Functions for extracting questions from files.
::: karenina.domain.questions.extractor options: show_source: false heading_level: 4 members: - extract_questions_from_file - QuestionExtractionConfig
Functions for generating answer templates.
::: karenina.domain.answers.generator options: show_source: false heading_level: 4 members: - generate_answer_template
Functions for exporting verification results.
::: karenina.benchmark.exporter options: show_source: false heading_level: 4 members: - export_verification_results_csv - export_verification_results_json
JSON-LD checkpoint model for saving/loading benchmarks.
::: karenina.schemas.JsonLdCheckpoint options: show_source: false heading_level: 4 show_root_heading: false
Functions for database persistence.
::: karenina.storage options: show_source: false heading_level: 4 members: - save_benchmark - load_benchmark - DBConfig