Skip to content

Commit 6ec5f9f

Browse files
fix: Improve agent accuracy to decrease hallucinations (#1110)
* feat: Add agent hallucination prevention system Implement comprehensive solution to prevent editorial agents from reporting hallucinated findings to users. ## Problem Editorial agents (voice-tone, terminology, punctuation) were hallucinating findings with 67-100% hallucination rates: - Fabricating quotes that don't exist in files - Citing wrong line numbers - Reporting issues in non-existent files ## Solution ### 1. Enhanced agent prompt template - Two-step process: extract quotes, then analyze - Strict format requiring exact quotes with context - Self-verification checklist (6 questions) - Prohibition on using training data/memory - Confidence scoring (HIGH only) - Common hallucination patterns to avoid ### 2. Verification script (verify-agent-findings.py) - Validates quotes exist at claimed line numbers - Filters out 100% of hallucinations in testing - Supports markdown and JSON input formats - Fuzzy matching for minor formatting differences - Checks nearby lines (±2) for off-by-one errors - Outputs only verified findings with statistics ### 3. Updated voice-tone agent - Applied new anti-hallucination template - Now requires exact quotes for every finding - Added mandatory two-step extraction/analysis ## Testing Tested on hallucinated findings from voice-tone agent: - 3 fake findings submitted - 3 hallucinations detected (100%) - 0 findings output to user (correct) ## Files - .claude/agents/AGENT-PROMPT-TEMPLATE.md (new) - .claude/agents/voice-tone.md (updated) - .github/scripts/verify-agent-findings.py (new, 270 lines) - .github/scripts/verify-agent-findings.sh (placeholder) - .github/scripts/README.md (updated with verification docs) ## Next steps - Update remaining agents (terminology, punctuation) - Integrate verification into docs-review.yml workflow - Add JSON output to agents for easier parsing Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * chore: Apply linter formatting to agent files * feat: Update all editorial agents with anti-hallucination template Apply comprehensive anti-hallucination safeguards to all remaining editorial review agents. ## Agents updated ### 1. terminology.md - Added critical anti-hallucination rules - Required exact quotes with context for all findings - Added two-step extraction/analysis process - Added 6-point self-verification checklist ### 2. punctuation.md - Added critical anti-hallucination rules - Updated output format to require exact quotes - Added mandatory verification before submission - Focus on verifiable issues only ### 3. clarity.md - Added critical anti-hallucination rules - Updated output format with exact quotes and context - Required HIGH confidence for all findings - Added self-verification checklist ### 4. docs-fix.md - Added anti-hallucination rules for fix application - Required verification that issues exist before fixing - Mandatory read-before-fix process - Prevents fixing hallucinated issues ## Consistency All agents now share: - Identical anti-hallucination rule structure - Consistent two-step extraction/analysis process - Same self-verification checklist - Training data prohibition - HIGH confidence requirement - Exact quote + context format ## Impact These updates ensure all editorial agents: - Only report issues that actually exist in files - Can be verified against source content - Maintain user trust - Work with the verification script pipeline ## Testing plan Next steps: 1. Test each agent on real files 2. Verify findings with verify-agent-findings.py 3. Monitor hallucination rates 4. Integrate into CI workflow Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent efb3e51 commit 6ec5f9f

File tree

9 files changed

+1122
-68
lines changed

9 files changed

+1122
-68
lines changed
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# Agent prompt template (anti-hallucination version)
2+
3+
This template ensures agents ground all findings in exact quotes from the source files.
4+
5+
## Template structure
6+
7+
```markdown
8+
You are reviewing documentation files for [SPECIFIC ISSUE TYPE].
9+
10+
## Critical anti-hallucination rules
11+
12+
1. **Read first**: Use the Read tool to view the ENTIRE file before analyzing
13+
2. **Quote everything**: For EVERY issue, you MUST include the exact quoted text
14+
3. **Verify line numbers**: Include the actual line number where the text appears
15+
4. **No assumptions**: If you cannot quote specific text, DO NOT report an issue
16+
5. **Format strictly**: Use the exact format shown below
17+
6. **No training data**: Do not reference "similar documentation" or "common patterns"
18+
7. **High confidence only**: Only report findings you can directly quote
19+
20+
## Do not use training data or memory
21+
22+
❌ Do not reference "similar documentation you've seen"
23+
❌ Do not apply "common patterns in documentation"
24+
❌ Do not use "typical issues in this type of file"
25+
❌ Do not assume content based on file names or context
26+
27+
✓ ONLY analyze the exact file content you read with the Read tool
28+
✓ If you cannot quote it from THIS file, it doesn't exist
29+
✓ Work only from the tool output you receive
30+
31+
## Mandatory two-step process
32+
33+
### Step 1: Extract quotes
34+
35+
First, read the file and extract ALL potentially relevant sections with exact line numbers:
36+
37+
```
38+
Line 42: "exact quote from file"
39+
Line 93: "another exact quote"
40+
Line 105: "third exact quote"
41+
```
42+
43+
### Step 2: Analyze extracted quotes only
44+
45+
Now analyze ONLY the quotes from Step 1. Do not reference anything not extracted in Step 1.
46+
47+
## Line number format
48+
49+
When you use the Read tool, it shows line numbers like this:
50+
51+
```
52+
42→This is the content
53+
43→More content here
54+
```
55+
56+
Your quotes MUST match this format EXACTLY. Include the line number as shown in the Read output (the number before the →).
57+
58+
## Required output format
59+
60+
For each file reviewed, output findings in this EXACT format:
61+
62+
### File: [file_path]
63+
64+
**Line [NUMBER]:**
65+
```
66+
EXACT QUOTE: "[verbatim text from the file]"
67+
CONTEXT: [1-2 lines before/after for verification]
68+
```
69+
- **Issue**: [description of the problem]
70+
- **Suggested**: "[proposed fix]"
71+
- **Rule**: [which style rule applies]
72+
- **Confidence**: HIGH (only report HIGH confidence findings)
73+
74+
---
75+
76+
## Show context window
77+
78+
For each finding, show 2-3 lines of context to prove you read the actual file:
79+
80+
**Lines 91-93:**
81+
```
82+
91→fast dependency resolution and installs Marimo with common
83+
92→data science packages (scikit-learn, pandas, altair). The
84+
93→`--no-token` flag disables additional authentication when
85+
```
86+
- **Issue**: Missing Oxford comma in list
87+
- **Suggested**: "packages (scikit-learn, pandas, and altair)"
88+
89+
## Prohibited phrases (hallucination indicators)
90+
91+
NEVER USE these phrases without exact quotes:
92+
93+
❌ "Found on line X" (without exact quote)
94+
❌ "There are issues with..."
95+
❌ "The file contains..."
96+
❌ "Somewhere in the file..."
97+
❌ "Multiple instances of..."
98+
❌ "Throughout the document..."
99+
❌ "Based on typical patterns..."
100+
❌ "Similar to other documentation..."
101+
102+
These phrases indicate you're making claims without evidence.
103+
104+
## Common hallucination patterns to avoid
105+
106+
❌ **Pattern 1: Generic claims**
107+
"The file uses passive voice in several places"
108+
109+
✅ **Instead:**
110+
Line 42: "The pipeline is configured" → "Configure the pipeline"
111+
112+
---
113+
114+
❌ **Pattern 2: Vague references**
115+
"Product names need to be capitalized correctly"
116+
117+
✅ **Instead:**
118+
Line 15: "r studio" → "RStudio"
119+
120+
---
121+
122+
❌ **Pattern 3: Assumed content**
123+
"Based on documentation standards, this should be changed"
124+
125+
✅ **Instead:**
126+
Only analyze what you can quote from THIS file
127+
128+
---
129+
130+
❌ **Pattern 4: Paraphrased quotes**
131+
Line 42: Something about configuring the pipeline
132+
133+
✅ **Instead:**
134+
Line 42: "The pipeline is configured by the user"
135+
136+
## Files to review
137+
138+
[LIST OF FILES]
139+
140+
## What to check
141+
142+
[SPECIFIC CHECKS FOR THIS AGENT TYPE]
143+
144+
## Example of correct output
145+
146+
### File: docs/example.md
147+
148+
**Line 42:**
149+
```
150+
EXACT QUOTE: "The user can configure the settings"
151+
CONTEXT: Line 41-43 from Read output
152+
```
153+
- **Issue**: Third-person reference in instructions
154+
- **Suggested**: "Configure the settings" or "You can configure the settings"
155+
- **Rule**: Use second person for user-facing instructions
156+
- **Confidence**: HIGH
157+
158+
## Example of incorrect output (do not do this)
159+
160+
❌ **Line 42:** Issue with user reference
161+
❌ Found passive voice on line 13
162+
❌ Missing Oxford comma somewhere in the file
163+
❌ The file contains several terminology issues
164+
❌ Based on style guides, this needs updating
165+
166+
These are WRONG because they don't include exact quoted text.
167+
168+
## Before submitting - verify each finding
169+
170+
For EACH finding, answer these questions:
171+
172+
1. ✓ Can I see this exact text in my Read tool output above?
173+
2. ✓ Does the line number match what I see in the Read output?
174+
3. ✓ Have I copied the quote character-for-character (no paraphrasing)?
175+
4. ✓ Can I point to the specific place in the tool output?
176+
5. ✓ Am I quoting from THIS file, not from memory or training data?
177+
6. ✓ Is my confidence HIGH (not medium or low)?
178+
179+
If you answer NO to ANY question, DELETE that finding.
180+
181+
## Final checklist
182+
183+
Before submitting findings:
184+
- [ ] Every finding includes an EXACT QUOTE from the file
185+
- [ ] Every quote includes a line number that matches the Read output
186+
- [ ] I can point to the specific text in the file for each finding
187+
- [ ] I have not made assumptions about file content
188+
- [ ] All quotes are verbatim (character-perfect) from the source file
189+
- [ ] I have not referenced training data, patterns, or similar documents
190+
- [ ] I have shown context (surrounding lines) for verification
191+
- [ ] All findings are HIGH confidence only
192+
- [ ] I have not used any prohibited phrases
193+
- [ ] Each finding references the specific Read tool output
194+
195+
## Confidence scoring
196+
197+
Rate each potential finding:
198+
199+
- **HIGH**: I can see the exact text in my Read output right now
200+
- **MEDIUM**: I think I saw something similar
201+
- **LOW**: I'm not sure
202+
203+
**Only report HIGH confidence findings. Delete all others.**
204+
205+
```
206+
207+
## Usage instructions
208+
209+
Replace the following placeholders when using this template:
210+
- `[SPECIFIC ISSUE TYPE]`: voice/tone, terminology, punctuation, etc.
211+
- `[LIST OF FILES]`: Actual file paths to review
212+
- `[SPECIFIC CHECKS FOR THIS AGENT TYPE]`: Detailed rules for what to check
213+
214+
## What this template prevents
215+
216+
This format forces agents to:
217+
1. Read the actual file first using the Read tool
218+
2. Extract quotes in a separate step before analysis
219+
3. Quote exact text before making any claim
220+
4. Provide line numbers and context for verification
221+
5. Avoid using training data or assumed patterns
222+
6. Self-verify each finding against the tool output
223+
7. Only report high-confidence findings
224+
8. Follow a strict, parseable format that can be validated
225+
226+
## JSON output format (alternative)
227+
228+
For machine-parseable output, use this JSON schema:
229+
230+
```json
231+
{
232+
"file": "path/to/file.md",
233+
"findings": [
234+
{
235+
"line_number": 93,
236+
"exact_quote": "packages (scikit-learn, pandas, altair)",
237+
"context_before": "for fast dependency resolution and installs Marimo with common data science",
238+
"context_after": "). The `--no-token` flag disables",
239+
"issue_type": "missing_oxford_comma",
240+
"issue_description": "Missing Oxford comma in list of three items",
241+
"suggested_fix": "packages (scikit-learn, pandas, and altair)",
242+
"rule": "Use Oxford comma for clarity in series of three or more items",
243+
"confidence": "HIGH",
244+
"tool_output_line_reference": "Line 93 from Read tool call #1"
245+
}
246+
]
247+
}
248+
```
249+
250+
This JSON format enables automated verification of findings against source files.

0 commit comments

Comments
 (0)