feat(proforge): measured supervisor quality gates + review-required label#213
Conversation
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
fbd2528 to
3896e87
Compare
…abel Wave B · deepens the ProForge supervisor (audit P0): scores were hard-coded constants and thresholds were fixed in code. - Measured scoring: replace the flat 80/85/88/90 pass-scores with a confidence score that scales with how much signal a stage produced relative to manuscript size (findings per ~1000 words), within a pass band. Fail/"suspect" scores now scale with manuscript size too. The supervisor still does NO AI calls — this is heuristic confidence, not editorial quality — but the score now actually varies with the work done instead of always reporting the same number. - Configurable thresholds: new QualityThresholds (largeManuscriptWords + intakeHardGate) on PipelineConfig, defaulted via DEFAULT_QUALITY_THRESHOLDS and threaded from run config → SupervisorAgent + the orchestrator's intake hard gate. - Dashboard: an explicit "Experimental — your review is required" line, so the human-in-the-loop expectation is stated, not just implied in help. i18n: proforge.pipeline.reviewRequired across all 19 locales + bundles. Tests: measured-score-varies-with-findings, configurable-threshold behaviour; existing exact-score assertions relaxed to pass-band ranges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
451d233 to
426da66
Compare
|
@CodeAnt-AI review |
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
1 similar comment
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
Sequence DiagramThis diagram shows how the ProForge orchestrator now threads configurable quality thresholds into the SupervisorAgent to compute measured confidence scores per stage and enforce a hard intake gate, while reporting results back to the dashboard. sequenceDiagram
participant Author
participant Dashboard
participant Orchestrator
participant Supervisor
Author->>Dashboard: Start ProForge pipeline
Dashboard->>Orchestrator: Run pipeline with config and quality thresholds
Orchestrator->>Supervisor: Initialize with thresholds (defaults overridden by config)
loop For each stage
Orchestrator->>Orchestrator: Run stage agent and collect findings
Orchestrator->>Supervisor: Evaluate stage using findings and manuscript size
Supervisor->>Supervisor: Compute confidenceScore and suspectScore
Supervisor-->>Orchestrator: Return pass flag and qualityScore
alt Intake stage below intake hard gate
Orchestrator-->>Dashboard: Mark intake failed with diagnostic message
break
else Stage passes quality gate
Orchestrator-->>Dashboard: Mark stage complete and advance
end
end
Generated by CodeAnt AI |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
CodeAnt: partialPipelineConfigSchema lacked qualityThresholds, so Zod silently stripped the key for Node/MCP capability callers — overrides were dropped and the supervisor always used DEFAULT_QUALITY_THRESHOLDS. Add qualityThresholdsSchema to the validator and pass config.qualityThresholds into the SupervisorAgent so non-Redux entry points honor the config contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@CodeAnt-AI review |
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
@CodeAnt-AI review |
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
Sequence DiagramThis PR updates the ProForge pipeline so each stage is evaluated by a supervisor that computes a measured confidence score based on findings and manuscript size, using configurable quality thresholds and an intake hard gate. sequenceDiagram
participant User
participant ProForgePipeline
participant StageAgent
participant Supervisor
User->>ProForgePipeline: Start stage with config and manuscript
ProForgePipeline->>StageAgent: Execute stage work
StageAgent-->>ProForgePipeline: Stage result and review items
ProForgePipeline->>Supervisor: Evaluate quality with thresholds and word count
Supervisor-->>ProForgePipeline: pass flag, quality score, retry suggestion
alt Intake stage below hard gate
ProForgePipeline-->>User: Fail run with diagnostic message
else Pass or soft flag
ProForgePipeline-->>User: Return stage result and supervisor decision
end
Generated by CodeAnt AI |
…-100 Three CodeAnt findings on the quality-gate config contract: - Gate intake failure on the supervisor actually flagging it (!decision.pass) plus a sub-floor score, not score alone — a legitimately weak-but-analyzed manuscript no longer mislabels as an AI-provider failure. - Centralize the rule in SupervisorAgent.intakeHardGateFailed so the orchestrator AND the capability layer (Node/MCP runStage) enforce identical behavior; the capability layer now throws STAGE_FAILED on a fallback intake instead of returning a misleading success. - Bound intakeHardGate to 0..100 in the Zod schema so an impossible (>100) threshold can't make every intake fail via misconfiguration. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@CodeAnt-AI review |
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
Sequence DiagramThis PR makes the ProForge supervisor use measured, configurable quality thresholds for intake and editing stages, and enforces a shared intake hard gate while clearly signaling that human review is required. sequenceDiagram
participant Author
participant Dashboard
participant ProForgeBackend
participant StageAgent
participant SupervisorAgent
Author->>Dashboard: Start ProForge intake with run settings
Dashboard->>ProForgeBackend: Run intake with config (quality thresholds)
ProForgeBackend->>StageAgent: Execute intake agent
StageAgent-->>ProForgeBackend: Return diagnostic and review items
ProForgeBackend->>SupervisorAgent: Evaluate intake using thresholds
alt Intake fallback and score below intake gate
SupervisorAgent-->>ProForgeBackend: Hard gate failure decision
ProForgeBackend-->>Dashboard: Report intake failed (check AI provider)
else Intake analyzed or above gate
SupervisorAgent-->>ProForgeBackend: Pass or soft fail decision with score
ProForgeBackend-->>Dashboard: Show intake result and mark stage for review
end
Dashboard-->>Author: Display experimental review required notice
Generated by CodeAnt AI |
…f findings Four CodeAnt findings on the measured confidence scores: - structural/lineProse/copyEdit summed agentOutput edits AND reviewItems, but reviewItems are derived 1:1 from those same edits — inflating confidence. Use Math.max(editCount, reviewItems.length) as the canonical single-source signal so the score is proportional to real work, never doubled. - proof scored only grammar issues, ignoring style/technical/legal findings. Count all proof-stage signal so reports with substantial non-grammar findings aren't mis-scored (and aren't mislabeled as fallbacks). Adds a double-count regression guard + a non-grammar-proof test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@CodeAnt-AI review |
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
Sequence DiagramThis PR updates the ProForge pipeline so stage results are scored by a measured supervisor and intake failures are consistently hard-gated using configurable quality thresholds, with callers clearly informed that human review is required. sequenceDiagram
participant User
participant ProForgeRunner
participant StageAgent
participant Supervisor
User->>ProForgeRunner: Start stage with optional quality thresholds
ProForgeRunner->>StageAgent: Execute stage on manuscript
StageAgent-->>ProForgeRunner: Return agent output and review items
ProForgeRunner->>Supervisor: Evaluate stage result with thresholds
Supervisor-->>ProForgeRunner: Decision with measured quality score
alt Intake hard gate fails
ProForgeRunner-->>User: Report intake failure and do not advance pipeline
else Stage passes or soft fails
ProForgeRunner-->>User: Return stage result and supervisor decision for review
end
Generated by CodeAnt AI |
User description
Wave B · PR6 — ProForge measured quality gates (P0)
Stacked on #212 → … → #208.
Deepens the ProForge supervisor: scores were hard-coded constants and thresholds were fixed in code.
Changes
QualityThresholds(largeManuscriptWords+intakeHardGate) onPipelineConfig, defaulted viaDEFAULT_QUALITY_THRESHOLDS, threaded from run config →SupervisorAgent+ the orchestrator's intake hard gate.i18n
proforge.pipeline.reviewRequiredacross all 19 locales + bundles.Tests
Measured-score-varies-with-findings, configurable-threshold behaviour; existing exact-score assertions relaxed to pass-band ranges. 61 ProForge tests green; typecheck + lint + i18n clean.
🤖 Generated with Claude Code
CodeAnt-AI Description
Make ProForge quality checks configurable and show that review is required
What Changed
Impact
✅ Clearer review expectations✅ Fewer false intake failures✅ More consistent quality gate behavior💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.