'Audit' and pending change system, fluent API, better text model handling by tazlin · Pull Request #206 · Haidra-Org/horde-model-reference

tazlin · 2026-04-07T16:44:00Z

Introduces a persistent audit and pending-change system, expands query and search capabilities, and refactors core model metadata, caching, and text model handling to support richer APIs and more robust infrastructure.

Key Changes

Audit & Pending Changes - Better atomic operations, traceability, and role-based permissions for editing.
- Adds a audit trail subsystem with event writing, reading, and replay support
- Introduces a pending queue with diff previews and apply/delete workflows
- Integrates audit-aware write paths across backends with optional request/user context
Query & API Enhancements
- Adds a typed, fluent ModelQuery API with a field-reference DSL
- Expands query support across model categories with stronger typing
- Introduces v2 search and popularity endpoints with pagination and filtering
- Adds text utility endpoints for parsing, grouping, and managing model names
Caching & Performance
- Implements background cache hydration with stale-while-revalidate behavior
- Adds cache support for audit/statistics (deletion risk) data
- Improves cache keying and backend-aware statistics handling
Model Metadata & Registries
- Replaces global constants with structured registries and descriptors
- Moves model constants into a dedicated model_consts package
- Standardizes category/baseline/tag definitions and validation
Text Model Handling
- Improves model name parsing, normalization, and grouping
- Adds base model name extraction and variant handling
- Introduces CSV-based serialization for text generation models (with legacy compatibility)
Infrastructure & Reliability
- Adds shared HTTP retry utilities and circuit breaker support
- Refactors settings, CORS, and general infrastructure
- Makes Redis an optional dependency
Typing, Validation, and Cleanup
- Strengthens type safety and validation across models and services
- Updates codebase to Python 3.12+ conventions
- Renames and standardizes APIs (e.g., _unsafe → _or none)
- Improves documentation, tests, and CI coverage
Duplicate "Audit"-naming refactor
- Renames the model usage statistics "audit" subsystems to the more accurate "ModelDeletionRisk" (and similar). This helps avoid confusion with the model list change "audit" feature introduced in this PR.

Notes

Several APIs and imports were renamed or reorganized (notably around registries, query fields, and text helpers). This will be a major version update.

- Enhanced get_base_model_name to strip backend and author prefixes before extracting the base name. - Added tests to verify correct handling of backend and author prefixes, as well as consistency across model name variations.

- Implements background cache hydration for audit/statistics caches with a new CacheHydrator service, enabling proactive cache warming and stale-while-revalidate behavior. - Adds per-backend statistics (backend variations) to audit analysis and API, including new query parameters and cache key logic. - Updates text model grouping, model name variant generation, and refactors cache and service startup/shutdown to support these features. - Includes new tests for pending queue audit endpoints.

Added ModelReferenceManager and horde_model_reference_settings to the import statement for use in shared.py.

Introduce structured registries and descriptors for categories, baselines, enums, and tags; extract text-backend helpers into a new text_backend_names module. Key changes: - Added new modules/files: model_kind_validation.py, settings.py (stub), text_backend_names (new helper imports referenced across code). - Reworked meta_consts: replaced many global lists/vars with DescriptorRegistry/EnumRegistry-backed registries, CategoryDescriptor and BaselineDescriptor dataclasses, and registration helpers (register_category, register_image_baseline, get_category_descriptor, get_all_registered_categories, get_all_registered_baselines, get_baseline_descriptor, is_known_image_baseline, etc.). Finalizes registries and validates enum members are registered. - Moved legacy text backend prefix logic and name-variant helpers out of meta_consts into text_backend_names; updated imports across the codebase. - Updated model_reference_records to use the new registries and validation framework: record type registration changes, model classification derived from category descriptors, normalized tag/style types, per-category kind policies, and get_record_type_for_category helper. - Updated GitHub sync and backend code to use new getter APIs for github image/text categories and legacy backend prefixes. - Small script and analytics updates to use moved helpers and imports. - Added/updated tests for constants, records, registries, and model kind validation. Notes: - Several public names were renamed or moved (e.g., github_image_model_reference_categories -> get_github_image_categories(), strip_backend_prefix and legacy prefix constants moved to text_backend_names). This may require callers to update imports.

Introduce a persistent audit subsystem and pending-change tooling. - Adds a new audit package (events, reader, writer, replayer) and wires an AuditTrailWriter into ModelReferenceManager and FileSystem/Redis backends to emit create/update/delete events with optional logical_user_id/request_id context. - Add a Pending Queue feature (package, service, models, routers) and a DiffService to compute preview diffs for pending changes. - Implement audit-aware write paths, backend duplicate handling for text models, and cascade behaviors for legacy format operations. - Also add several analytics typing/import fixes, default data files, generation parameter fixtures, and corresponding tests to exercise the new functionality.

…style optional

- Introduce a typed, in-memory fluent query builder and field-reference DSL. - Adds new modules: src/horde_model_reference/query.py (ModelQuery, ImageGenerationQuery, TextModelQuery, factory builders and cross-category queries) and src/horde_model_reference/query_fields.py (FieldRef, Predicate, OrderSpec and per-category field namespaces). - Update ModelReferenceManager to expose typed query() helpers (including query_all, query_image_generation, query_text_generation, query_controlnet), add overloads for better type narrowing, and fix a debug log formatting to use f-string. - Add scripts/verify_query_fields.py to detect drift between query field refs and Pydantic record fields, and comprehensive tests in tests/test_query.py. - Export the new query and field symbols from package __init__.py.

- Typing and style cleanups across the codebase: replace typing.TypeAlias usages with native type aliases, use StrEnum for string enums, and simplify generic declarations and type annotations. - This matches python 3.12+ ideals - Add optional auditing/tracing parameters (logical_user_id, request_id) to backend APIs and filesystem backend methods, and expand protocol docstrings for pending-queue apply/delete callables. - Improve docstrings, small formatting/PEP8 fixes, and minor logic cleanups (e.g. set() usage). - Tests updated to match formatting and include descriptive docstrings. - Changes are primarily non-functional and focused on typing, documentation, and audit-related parameters.

Apply stronger type safety, input validation, and small bug fixes across code and tests. Key changes: - Add explicit checks/conversions for parameters, generation params, and legacy config entries; ensure defaults (e.g. nsfw False). - Strengthen typed generics and computed_field decorators; remove unnecessary type-ignore annotations. - Hardening for callbacks/logging and pubsub iteration (safer name lookup and iterable check). - Improve integer coercion and payload parsing with clearer handling of int/str inputs. - Update tests to use concrete config model (GenericModelRecordConfig), call model_validate for settings, remove many type:ignore usages, and ensure singletons are reset/cleared safely.

- Auto applied formatting fixes - Removed quotes from RedisCache child classes generic selector - Also made the `_instance` override explicit for static type clarity lint: auto-applied formatting fixes

This quiets all but one mypy issue, and should not reflect any functional change

- Rename all query field classes from *F to *Fields and add FieldRef type annotations; update imports/exports accordingly. - Introduce a field-name type parameter to ModelQuery and tighten internal types (predicates -> Sequence, improved _clone). - Add ControlNetQuery and build_controlnet_query, plus typed query entrypoints in ModelReferenceManager (controlnet, blip, clip, codeformer, esrgan, gfpgan, safety_checker, audio_generation, video_generation, miscellaneous). - Update return types and overloads to reflect the new per-category field name generics. - Also adjust pending-queue imports (split service/store) and remove redundant typed overloads in Image/Text query classes.

This prevent docs from building.

Reversing the ordering causes runtime failures at runtime, while @computed_field causes certain linters to complain. Favoring the runnable version for now

- Introduce a more robust site under docs/ (nav .pages, Concepts, Reference, Guides, Tutorials) with many new conceptual and reference pages (architecture, analytics pipeline, canonical format, request lifecycle, sync system, integrations, design decisions, audit trail, pending queue plan, model reference docs, etc.). - Update README.md to add Query API examples and surface the new Getting Started / Querying Models / Tutorials and Deployment/Operations links. - Several legacy docs were moved into docs/reference for better organization. This change provides end-user guides, how-tos, and architecture/reference material to support library usage, deployment, and the new fluent query API.

"unsafe" has a potentially confusing other meaning ("memory safety"). To clarify and avoid confusion, any method using this name now has been changed to `_or_none` to reflect the actual effect - `None` get returned instead (at the cost of static type safety) of an exception being raised

- Introduce a new v2 search router with paginated search and a popular-models endpoint backed by live Horde stats. - Add PopularModelResult and merge logic to expose ranked results from ModelReferenceManager.get_popular_models. - Register the new router in the app. - Add convenience properties to GenericModelRecord (primary_download_url, all_download_urls, download_count). Make Redis backend an optional extra by deferring import (module __getattr__) and raising a clear error if redis.asyncio is missing. - Add ModelReferenceManager.reset to allow tearing down the singleton for tests, and perform minor typing/overload cleanups.

- Add better filter handling and cross-category filtering for v2 search endpoints, returning 400 for unsupported filters instead of 500. - Switch cross-category filters to safe lambda-based checks for optional fields. Tighten typing and imports in ModelReferenceManager (use HordeModelType, inline overloads) and adjust related imports. - Small formatting/PEP-style changes in analytics, diff service, and redis backend error message. - Rename 'unsafe' API mentions to '*_or_none' in docs. - Add multiple tests: v2 search/popular endpoint tests, backend lazy-import tests, additional data_merger/popular model tests, ModelReferenceManager reset/popular tests, and GenericModelRecord download convenience property tests.

Expand contributor guide with testing and workflow instructions.

Analytics: - Refactor analytics terminology and implementation: replace audit analysis with "deletion risk" across the codebase and docs. - Renamed modules and classes (e.g. audit_analysis.py -> deletion_risk_analysis.py, AuditCache -> DeletionRiskCache, ModelAuditInfo -> ModelDeletionRiskInfo, CategoryAuditResponse -> CategoryDeletionRiskResponse), updated cache hydrator to hydrate deletion risk cache, updated settings and .env.example to use deletion_risk_cache_ttl, and adjusted related constants, filter presets, text-model grouping, and base cache docs/strings. - Updated service routers, tests, and other references to match the new naming. Also added pending_queue/audit_events.py. Audit: - Removes redundant AuditDomain in favor of CanonicalFormat - Removes most magic strings in favor of a StrEnum - Improves the intended atomic behavior of changes - Adds an "Applying" state to better isolate issues/crashes when attempting to apply a change

- Introduce shared HTTP retry utilities and a lightweight circuit breaker (src/horde_model_reference/http_retry.py) using tenacity; wire them into the sync script, GitHub and HTTP backends, Horde API integration, and Redis retry logic. - Replace custom retry loops with structured sync/async retry contexts and handle retryable HTTP status codes (5xx/429) and transient network errors; record successes/failures to the circuit breaker and provide stale-cache fallbacks when Horde API is degraded. - Enhance heartbeat to report AI Horde API status, make cache hydration tolerant to Horde API failures, and update default PRIMARY API URLs in docs and settings to models.aihorde.net. - Misc: remove unused imports and adjust exception handling to catch tenacity.RetryError where appropriate.

- Add missing tenacity to project dependencies (relief on by previous changes) - Extra logging and validation: a field validator for primary_api_url and an after-model validator that logs replicate_mode and canonical_format. - Change AIHordeStatus to inherit ContainsStatus. - Add debug logging to pending queue allowlist builders. - Tests: ensure HORDE_MODEL_REFERENCE_PRIMARY_API_URL is set during test runs (with informational prints), remove a flaky cache-load log assertion, and adjust expected model-name variant counts/values to match updated variant behavior.

- Enhance text model name parsing (size/version extraction, normalized cleanup) and add name-format inference and group summary computation. - Introduces NameFormatSchema, TextModelGroupSummary, infer_name_format, and compute_group_summaries to aggregate sizes, quants, baselines, tags and naming conventions for model groups. - Adds a new v2 /text_utils router with endpoints to parse names, list group members, get distinct baselines, compose names, and batch-update common group fields. - Integrates the utilities into the app (include router) and into v1/v2 reference flows (embed group summaries in v1 legacy responses and auto-set text_model_group for v2 text_generation creates). - Also cleans up some logging/comment clutter in http_retry and updates/extends tests to cover parsing, grouping, composition, and baseline discovery; plus small test adjustments (AsyncMock ignore fixes and expectation tweaks). These changes enable a frontend group-editing UX and more robust model-name handling.

Having learned by lesson as to why its the default, this change adjusts the default and updates usages and documentation appropriately. Note that the key issue is ambiguity in resolving nested models in the case those models had underscores in them - there needs to be a way to disambiguate between names with underscores and the marker for a nested model. See https://docs.pydantic.dev/latest/concepts/pydantic_settings/#parsing-environment-variable-values for a detailed explanation.

for more information, see https://pre-commit.ci

- Add # type: ignore annotations for computed_field decorator usages and other mypy complaints (prop-decorator, untyped calls, assignment, overload overlaps). - Make Redis pub/sub listen call explicitly ignored for untyped call to satisfy type checkers. - Improve PendingChangeRecord schema by converting several integer fields to Pydantic Field(...) with descriptive documentation (change_id, batch_id, applied_job_id) to clarify semantics and lifecycle. - Adjust query field dunder methods' comment ordering to satisfy linters and keep DSL behavior intact. - Replace closure-default lambda captures in v2 search filters with outer variables to avoid late-binding issues and clarify intent. - Update service signatures and models: remove ContainsStatus usage and make AIHordeStatus a BaseModel; endpoint return types widened to include fastapi.Response where appropriate. - Fix pending-change handling in v2 text utils: pending_change_ids typed as list[int] and use change.change_id when collecting IDs. - Minor typing clarifications in the text generation serializer (frozenset literal formatting and explicit row typing). These are non-functional clarifications and type-safety improvements intended to reduce linter/mypy warnings and improve schema documentation for maintainers.

for more information, see https://pre-commit.ci

- Add a new .hadolint.yaml to enforce hadolint rules (error threshold, trusted registries, and an override for DL3015) and ignore DL3008. - Bump pre-commit mypy hook to v1.19.1 and add tenacity to the mypy additional deps. - Update Dockerfile to unpin curl (with a hadolint ignore comment for DL3008) to avoid repeated CI breakage from Debian repo churn and clean up related whitespace.

for more information, see https://pre-commit.ci

This test doesn't meaningfully test/validate anything not captured elsewhere intests

Add 'src/' to the mypy hook args in .pre-commit-config.yaml so pre-commit runs type checks only against the project's source directory

This reverts commit 663d363.

Set SKIP=mypy for the pre-commit step and enable --show-diff-on-failure so lint failures include diffs. This avoids running mypy in this workflow (handled elsewhere) and makes CI failures easier to debug.

Ensure any existing empty or invalid hadolint-results.sarif is removed before writing a default SARIF payload. This prevents corrupted/zero-byte files from being left in the workflow artifacts and ensures the generated SARIF is valid for downstream consumers.

- Switch GitHub fetch to raw repo URLs and json/httpx to avoid ujson-vs-json parsing mismatches that produced false-positive diffs. - Removed reliance on the legacy GitHubBackend and added path_consts URL resolution, retrying HTTP fetches and clearer error handling/logging. - Prevent PR creation when the comparator reports changes but no actual file changes are committed: _commit_changes and _commit_multi_category_changes now return bool and callers skip PR creation when False. Update run_sync_once to record None PRs and log warnings. - Add unit tests covering commit guard behavior, multi-category behavior, and GitHub URL resolution.

for more information, see https://pre-commit.ci

tazlin added 30 commits April 7, 2026 13:12

fix: more base model name extraction and add tests

3a868dc

- Enhanced get_base_model_name to strip backend and author prefixes before extracting the base name. - Added tests to verify correct handling of backend and author prefixes, as well as consistency across model name variations.

fix: imports in shared.py

c6d17c6

Added ModelReferenceManager and horde_model_reference_settings to the import statement for use in shared.py.

lint: fix

b6c4f87

style: fix

dc78084

docs: rebuild mkdocs module stubs

8ef79db

feat: add instruct_format to text generation models, make controlnet_…

b07dfa2

…style optional

refactor: settings, CORS configuration, and infrastructure cleanup

a73c714

ci/chore: docs tweaks, 3.12+, check query field drift on ci

ad17a40

lint: autofixes from 3.12+ and updated ruff

3115cbb

style/lint: formatting + type hint fixes

e3894a6

- Auto applied formatting fixes - Removed quotes from RedisCache child classes generic selector - Also made the `_instance` override explicit for static type clarity lint: auto-applied formatting fixes

lint/chore: fix lingering typing issues

f7b9bc3

This quiets all but one mypy issue, and should not reflect any functional change

fix/docs: remove import causes doc to recurse

b871712

This prevent docs from building.

fix: @computed_field @Property ordering fix

d06fa0e

Reversing the ordering causes runtime failures at runtime, while @computed_field causes certain linters to complain. Favoring the runnable version for now

fix: ttl regression in base_cache

7fae678

lint: autoapplied formatting fixes

7211c5a

build: make redis optional, remove accidental mkdocs dep

84f3b02

build: latest deps uv.lock

b825c6b

docs: CONTRIBUTING.md add testing & git workflow

08046c8

Expand contributor guide with testing and workflow instructions.

chore: remove moot comments, apply formatter

656ef19

tazlin and others added 10 commits April 7, 2026 13:14

docs: rebuild mkdocstrings stubs

06c5dbd

docs: fix wrong aihorde.net references

6db226f

[pre-commit.ci] auto fixes from pre-commit.com hooks

323f244

for more information, see https://pre-commit.ci

chore: update uv.lock; trivy pin

380c2b2

ci: fix duplicate job entry, bump lagging tool versions

59222df

tazlin force-pushed the analytics-vs-statistics branch from 4d1cdcc to 59222df Compare April 7, 2026 17:17

tazlin and others added 17 commits April 7, 2026 13:21

chore: uv.lock w/o local deps

ae748a3

[pre-commit.ci] auto fixes from pre-commit.com hooks

17ee8da

for more information, see https://pre-commit.ci

fix: remove 204 from delete; set no model for delete content

416bf61

[pre-commit.ci] auto fixes from pre-commit.com hooks

406bef3

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

759c02d

for more information, see https://pre-commit.ci

chore/ci: legacy -> LEGACY

eccaa9d

tests: remove dated test

ba26d48

This test doesn't meaningfully test/validate anything not captured elsewhere intests

ci: scope mypy hook to src/ in pre-commit

663d363

Add 'src/' to the mypy hook args in .pre-commit-config.yaml so pre-commit runs type checks only against the project's source directory

ci: only upload hadolint sarif if it has content

f927578

Revert "ci: scope mypy hook to src/ in pre-commit"

61964e9

This reverts commit 663d363.

ci: skip mypy on pre-commit

f2be457

Set SKIP=mypy for the pre-commit step and enable --show-diff-on-failure so lint failures include diffs. This avoids running mypy in this workflow (handled elsewhere) and makes CI failures easier to debug.

[pre-commit.ci] auto fixes from pre-commit.com hooks

83581d8

for more information, see https://pre-commit.ci

lint: declarative docstring in test

117eef0

tazlin merged commit 7957e0d into main Apr 12, 2026
25 of 26 checks passed

tazlin mentioned this pull request Apr 14, 2026

v3 - Fluent API, Audit system, better service, and more #207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

'Audit' and pending change system, fluent API, better text model handling#206

'Audit' and pending change system, fluent API, better text model handling#206
tazlin merged 64 commits intomainfrom
analytics-vs-statistics

tazlin commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tazlin commented Apr 7, 2026

Key Changes

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant