feat(proforge): measured supervisor quality gates + review-required label by qnbs · Pull Request #213 · qnbs/WorldScript-Studio

qnbs · 2026-06-23T09:20:06Z

User description

Wave B · PR6 — ProForge measured quality gates (P0)

Stacked on #212 → … → #208.

Deepens the ProForge supervisor: scores were hard-coded constants and thresholds were fixed in code.

Changes

Measured scoring: replaces the flat 80/85/88/90 pass-scores with a confidence score that scales with how much signal a stage produced relative to manuscript size (findings per ~1000 words), within a pass band. "Suspect"/fail scores scale with size too. The supervisor still does no AI calls — it's heuristic confidence, not editorial quality — but the score now actually varies with the work done instead of always reporting the same number.
Configurable thresholds: new QualityThresholds (largeManuscriptWords + intakeHardGate) on PipelineConfig, defaulted via DEFAULT_QUALITY_THRESHOLDS, threaded from run config → SupervisorAgent + the orchestrator's intake hard gate.
Dashboard: an explicit "Experimental — your review is required" line, so the human-in-the-loop expectation is stated, not just implied in help.

i18n

proforge.pipeline.reviewRequired across all 19 locales + bundles.

Tests

Measured-score-varies-with-findings, configurable-threshold behaviour; existing exact-score assertions relaxed to pass-band ranges. 61 ProForge tests green; typecheck + lint + i18n clean.

Note on Wave B / PR7: the remaining ProForge item (off-main-thread/non-blocking execution) needs deep WorkerBus-v2 integration of the orchestrator loop; given the pipeline is network-bound (AI calls), it's better scoped as its own focused PR off main rather than added to this 6-deep stack.

🤖 Generated with Claude Code

CodeAnt-AI Description

Make ProForge quality checks configurable and show that review is required

What Changed

ProForge now shows an explicit “review required” notice in the dashboard so users know the manuscript will not change without approval.
Supervisor quality gates now use configurable thresholds, so runs can be tuned instead of always using the same built-in limits.
Intake now fails when it is truly unanalyzable, while low-but-valid scores no longer trigger a false provider failure.
Quality checks now treat a wider set of proof findings as signal, and their scores vary with the amount of work done instead of staying fixed.
Added coverage for configurable thresholds, the new review notice, and the revised gate behavior.

Impact

✅ Clearer review expectations
✅ Fewer false intake failures
✅ More consistent quality gate behavior

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

codeant-ai · 2026-06-23T09:20:13Z

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

vercel · 2026-06-23T09:20:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
worldscript-studio	Ready	Preview, Comment	Jun 23, 2026 10:06pm

…abel Wave B · deepens the ProForge supervisor (audit P0): scores were hard-coded constants and thresholds were fixed in code. - Measured scoring: replace the flat 80/85/88/90 pass-scores with a confidence score that scales with how much signal a stage produced relative to manuscript size (findings per ~1000 words), within a pass band. Fail/"suspect" scores now scale with manuscript size too. The supervisor still does NO AI calls — this is heuristic confidence, not editorial quality — but the score now actually varies with the work done instead of always reporting the same number. - Configurable thresholds: new QualityThresholds (largeManuscriptWords + intakeHardGate) on PipelineConfig, defaulted via DEFAULT_QUALITY_THRESHOLDS and threaded from run config → SupervisorAgent + the orchestrator's intake hard gate. - Dashboard: an explicit "Experimental — your review is required" line, so the human-in-the-loop expectation is stated, not just implied in help. i18n: proforge.pipeline.reviewRequired across all 19 locales + bundles. Tests: measured-score-varies-with-findings, configurable-threshold behaviour; existing exact-score assertions relaxed to pass-band ranges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

qnbs · 2026-06-23T20:23:56Z

@CodeAnt-AI review

codeant-ai · 2026-06-23T20:23:58Z

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-06-23T20:24:00Z

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-06-23T20:25:34Z

Sequence Diagram

This diagram shows how the ProForge orchestrator now threads configurable quality thresholds into the SupervisorAgent to compute measured confidence scores per stage and enforce a hard intake gate, while reporting results back to the dashboard.

sequenceDiagram
    participant Author
    participant Dashboard
    participant Orchestrator
    participant Supervisor

    Author->>Dashboard: Start ProForge pipeline
    Dashboard->>Orchestrator: Run pipeline with config and quality thresholds
    Orchestrator->>Supervisor: Initialize with thresholds (defaults overridden by config)

    loop For each stage
        Orchestrator->>Orchestrator: Run stage agent and collect findings
        Orchestrator->>Supervisor: Evaluate stage using findings and manuscript size
        Supervisor->>Supervisor: Compute confidenceScore and suspectScore
        Supervisor-->>Orchestrator: Return pass flag and qualityScore

        alt Intake stage below intake hard gate
            Orchestrator-->>Dashboard: Mark intake failed with diagnostic message
            break
        else Stage passes quality gate
            Orchestrator-->>Dashboard: Mark stage complete and advance
        end
    end

Generated by CodeAnt AI

codecov · 2026-06-23T20:36:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

CodeAnt: partialPipelineConfigSchema lacked qualityThresholds, so Zod silently stripped the key for Node/MCP capability callers — overrides were dropped and the supervisor always used DEFAULT_QUALITY_THRESHOLDS. Add qualityThresholdsSchema to the validator and pass config.qualityThresholds into the SupervisorAgent so non-Redux entry points honor the config contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

qnbs · 2026-06-23T20:52:20Z

@CodeAnt-AI review

codeant-ai · 2026-06-23T20:52:25Z

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

qnbs · 2026-06-23T21:22:30Z

@CodeAnt-AI review

codeant-ai · 2026-06-23T21:22:35Z

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-06-23T21:23:44Z

Sequence Diagram

This PR updates the ProForge pipeline so each stage is evaluated by a supervisor that computes a measured confidence score based on findings and manuscript size, using configurable quality thresholds and an intake hard gate.

sequenceDiagram
    participant User
    participant ProForgePipeline
    participant StageAgent
    participant Supervisor

    User->>ProForgePipeline: Start stage with config and manuscript
    ProForgePipeline->>StageAgent: Execute stage work
    StageAgent-->>ProForgePipeline: Stage result and review items
    ProForgePipeline->>Supervisor: Evaluate quality with thresholds and word count
    Supervisor-->>ProForgePipeline: pass flag, quality score, retry suggestion

    alt Intake stage below hard gate
        ProForgePipeline-->>User: Fail run with diagnostic message
    else Pass or soft flag
        ProForgePipeline-->>User: Return stage result and supervisor decision
    end

Generated by CodeAnt AI

…-100 Three CodeAnt findings on the quality-gate config contract: - Gate intake failure on the supervisor actually flagging it (!decision.pass) plus a sub-floor score, not score alone — a legitimately weak-but-analyzed manuscript no longer mislabels as an AI-provider failure. - Centralize the rule in SupervisorAgent.intakeHardGateFailed so the orchestrator AND the capability layer (Node/MCP runStage) enforce identical behavior; the capability layer now throws STAGE_FAILED on a fallback intake instead of returning a misleading success. - Bound intakeHardGate to 0..100 in the Zod schema so an impossible (>100) threshold can't make every intake fail via misconfiguration. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

qnbs · 2026-06-23T21:38:37Z

@CodeAnt-AI review

codeant-ai · 2026-06-23T21:38:42Z

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-06-23T21:40:16Z

Sequence Diagram

This PR makes the ProForge supervisor use measured, configurable quality thresholds for intake and editing stages, and enforces a shared intake hard gate while clearly signaling that human review is required.

sequenceDiagram
    participant Author
    participant Dashboard
    participant ProForgeBackend
    participant StageAgent
    participant SupervisorAgent

    Author->>Dashboard: Start ProForge intake with run settings
    Dashboard->>ProForgeBackend: Run intake with config (quality thresholds)
    ProForgeBackend->>StageAgent: Execute intake agent
    StageAgent-->>ProForgeBackend: Return diagnostic and review items
    ProForgeBackend->>SupervisorAgent: Evaluate intake using thresholds

    alt Intake fallback and score below intake gate
        SupervisorAgent-->>ProForgeBackend: Hard gate failure decision
        ProForgeBackend-->>Dashboard: Report intake failed (check AI provider)
    else Intake analyzed or above gate
        SupervisorAgent-->>ProForgeBackend: Pass or soft fail decision with score
        ProForgeBackend-->>Dashboard: Show intake result and mark stage for review
    end

    Dashboard-->>Author: Display experimental review required notice

Generated by CodeAnt AI

…f findings Four CodeAnt findings on the measured confidence scores: - structural/lineProse/copyEdit summed agentOutput edits AND reviewItems, but reviewItems are derived 1:1 from those same edits — inflating confidence. Use Math.max(editCount, reviewItems.length) as the canonical single-source signal so the score is proportional to real work, never doubled. - proof scored only grammar issues, ignoring style/technical/legal findings. Count all proof-stage signal so reports with substantial non-grammar findings aren't mis-scored (and aren't mislabeled as fallbacks). Adds a double-count regression guard + a non-grammar-proof test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

qnbs · 2026-06-23T22:06:42Z

@CodeAnt-AI review

codeant-ai · 2026-06-23T22:06:48Z

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-06-23T22:07:57Z

Sequence Diagram

This PR updates the ProForge pipeline so stage results are scored by a measured supervisor and intake failures are consistently hard-gated using configurable quality thresholds, with callers clearly informed that human review is required.

sequenceDiagram
    participant User
    participant ProForgeRunner
    participant StageAgent
    participant Supervisor

    User->>ProForgeRunner: Start stage with optional quality thresholds
    ProForgeRunner->>StageAgent: Execute stage on manuscript
    StageAgent-->>ProForgeRunner: Return agent output and review items
    ProForgeRunner->>Supervisor: Evaluate stage result with thresholds
    Supervisor-->>ProForgeRunner: Decision with measured quality score

    alt Intake hard gate fails
        ProForgeRunner-->>User: Report intake failure and do not advance pipeline
    else Stage passes or soft fails
        ProForgeRunner-->>User: Return stage result and supervisor decision for review
    end

Generated by CodeAnt AI

codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label Jun 23, 2026

qnbs force-pushed the feat/voice-feedback branch from fbd2528 to 3896e87 Compare June 23, 2026 16:59

Base automatically changed from feat/voice-feedback to main June 23, 2026 20:17

qnbs force-pushed the feat/proforge-quality-gates branch from 451d233 to 426da66 Compare June 23, 2026 20:23

codeant-ai Bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Jun 23, 2026

codeant-ai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread features/proForge/types.ts

codeant-ai Bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Jun 23, 2026

codeant-ai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread services/proForge/proForgeCapabilityLayer.ts

Comment thread services/proForge/proForgeCapabilitySchemas.ts Outdated

codeant-ai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread services/proForge/proForgeOrchestrator.ts Outdated

codeant-ai Bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Jun 23, 2026

codeant-ai Bot reviewed Jun 23, 2026

View reviewed changes

codeant-ai Bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Jun 23, 2026

qnbs merged commit 5852270 into main Jun 23, 2026
19 checks passed

qnbs deleted the feat/proforge-quality-gates branch June 23, 2026 22:28

Conversation

qnbs commented Jun 23, 2026 • edited by codeant-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Wave B · PR6 — ProForge measured quality gates (P0)

Changes

i18n

Tests

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

vercel Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qnbs commented Jun 23, 2026

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Sequence Diagram

Uh oh!

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

qnbs commented Jun 23, 2026

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

qnbs commented Jun 23, 2026

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qnbs commented Jun 23, 2026

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qnbs commented Jun 23, 2026

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai Bot commented Jun 23, 2026

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

qnbs commented Jun 23, 2026 •

edited by codeant-ai Bot

Loading

vercel Bot commented Jun 23, 2026 •

edited

Loading

codecov Bot commented Jun 23, 2026 •

edited

Loading