feat(mcp): leadbay_scan_portfolio_signals — bulk portfolio signal scan + stale_at honesty fix (v0.19.1, product#3704) by ArtyETH06 · Pull Request #94 · leadbay/leadclaw

ArtyETH06 · 2026-06-08T23:55:32Z

Fixes the two root causes in leadbay/product#3704 — JM couldn't scan his 497-lead Monitor portfolio for an M&A signal without one full-profile API call per lead, and the agent then hallucinated coverage off stale_at.

Summary

New read-only composite leadbay_scan_portfolio_signals — scans a Monitor portfolio (or explicit leadIds) and bulk-reads cached web_fetch signals: a GET-only fan-out (no POST, no AI-qualification quota burn), capped at 5 concurrent by the client semaphore. Filters signal entries by a case- and accent-folded query (OR terms) + optional since date. Returns the matched cohort campaign-ready (lead_id, name, location, quoted matched_signals), feeding straight into add_leads_to_campaign. Separates not_researched[] (no cached content) from "no match" — the structural antidote to the hallucination.
stale_at honesty guardrail — new snippets/gates/signal-honesty.md, included in pull_followups, research_lead_by_id, and the followup_check_in prompt: freshness fields are not signal indicators; portfolio-wide signal questions route to the bulk tool; unresearched leads are reported honestly.
Shared _web-fetch-helpers.ts — splitEmojiSection / reshapeWebFetchContent / SECTION_PRIORITY extracted from research-lead-by-id.ts, now shared by both consumers (verbatim move, no divergent copy).
Registered in index.ts, _composite-file-names.ts, TOOLS_WITH_ROUTING, output-schema conformance, WORKFLOWS.md. Bumps @leadbay/mcp 0.18.2 → 0.18.3.

Pre-Landing Review

Ran the full /review pipeline (critical pass + testing & maintainability specialists + Claude adversarial + Codex adversarial). Three critical findings fixed, all on the #3704 coverage-honesty axis — each was a swallowed error path that reported partial coverage as complete:

Failed web_fetch read dropped silently — a non-quota read failure (404/500/network) was dropped from both matched[] and not_researched[] while still counted in scanned_count, violating the documented scanned_count = matched + non-matching + not_researched invariant. Now restructured to catch per-lead so failed reads land in not_researched with the lead name.
429 while paging /monitor never set quota_exceeded — a quota wall during portfolio enumeration returned an empty result as if "no matches". Now flagged.
Swallowed POST /monitor/filter still sent filtered=true — scanning against a stale server-side cohort. Now falls back to an honest unfiltered scan.

Plus two mechanical cleanups (dead sinceMs binding, redundant double-Boolean). No security / SQL / injection findings (read-only GET fan-out, no user-controlled SQL).

Test Coverage

New unit suite at packages/core/test/unit/composite/scan-portfolio-signals.test.ts — 13 tests: happy path, not_researched separation, since filter, diacritic/case folding, no-match, empty query, max_leads cap, 429 mid-read, 404 read → not_researched, Monitor pagination path (geo resolve → filter → paginate → bulk-read), 429 while paging → quota_exceeded, filter-POST failure → unfiltered fallback, ambiguous_locations early-return. The 5 bolded tests were added during review.

Verification

pnpm prompts:build, pnpm -r typecheck, pnpm -r test all green on the merged-with-main state: core 357, promptforge 16, mcp 376.
server.json version-alignment audit passes (0.18.3 across package.json + both server.json fields; @leadbay/mcp@0.18 npx pins unchanged on the 0.18 line).
Verified live end-to-end through the MCP server (US test account): one scan_portfolio_signals call over 169 leads surfaced genuine post-2025 acquirers (e.g. QUEST DRAPE, LLC — acquired Drape Kings, March 2025), reading signal text to discard false-positive senses of "acquisition", reporting 0 unresearched. No per-lead research loop.

Adversarial Review

Codex (gpt-5.5, high reasoning) found the quota_exceeded-on-pagination and stale-filter issues independently; both fixed above. Remaining Codex notes (query-term substring over-match on very short terms, max_leads NaN/negative) are agent-controlled inputs constrained by the input schema — informational, not blocking.

Plan Completion

All #3704 deliverables shipped: bulk scan tool, honesty separation of not_researched, freshness-vs-signal guardrail across the three prompts. Eval scenarios are fixture-ready (underdeliver + honesty) but not wired to run — the eval scenario-execution glue does not exist on this branch yet (documented in the scenario folder README).

closes https://github.com/leadbay/product/issues/3704

… scan + stale_at honesty fix Closes leadbay/product#3704. Two root causes, both fixed: 1. Capability gap — no way to scan a portfolio for a signal in bulk. JM built a 497-lead Monitor portfolio, asked which had an M&A signal since 2025, and the agent fell back to one research_lead_by_id call per lead (~60 calls) before abandoning the task. New composite leadbay_scan_portfolio_signals bulk-reads CACHED web_fetch signals across the portfolio (read-only GET fan-out, no quota burn), filters entries by a diacritic/case-folded query + optional `since` date, and returns the matched cohort campaign-ready. 2. stale_at hallucination — the agent inferred signal presence/absence from freshness fields and reported confident-but-wrong results. New snippets/gates/signal-honesty.md guardrail, included in pull-followups, research-lead-by-id, and the followup_check_in prompt: freshness markers are not signal indicators; route portfolio-wide signal questions to the bulk tool; unresearched leads are reported (not_researched), never fabricated. The tool separates "no matching signal" (researched, no match) from "not yet researched" (no data to search) via a not_researched[] bucket — the structural antidote to the original hallucination. Verified live against the US test account end-to-end through the MCP server: one scan_portfolio_signals call over 169 leads correctly surfaced genuine post-2025 acquirers, with the agent reading the signal text to discard false-positive senses of "acquisition" and reporting 0 unresearched. - Extracts web_fetch reshaping into shared _web-fetch-helpers.ts (reused by research-lead-by-id). - Registered in compositeReadTools, _composite-file-names, TOOLS_WITH_ROUTING, output-schema conformance, WORKFLOWS.md. - Unit tests (new file) cover match/since/diacritics/not_researched/429/cap. - Two eval scenarios authored (underdeliver + honesty); runner glue not yet on this branch, so they are fixture-ready, not wired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…b.com-leadbay-product-issues-3704

Pre-landing review surfaced three places where partial coverage was silently reported as complete, all on the #3704 honesty axis: - a non-quota web_fetch read failure (404/500/network) was dropped from both matched[] and not_researched[] while still counted in scanned_count — restructured the fan-out to catch per-lead so failed reads land in not_researched with the lead name. - a 429 while paging /monitor never set quota_exceeded, so a quota wall during portfolio enumeration returned an empty result as if 'no matches' — now flagged. - a swallowed POST /monitor/filter still sent filtered=true, scanning a stale server-side cohort — now falls back to an unfiltered scan. Also dropped a dead sinceMs binding and a redundant double-Boolean. Adds 5 regression tests (404 read, Monitor pagination path, 429 while paging, filter-POST failure fallback, ambiguous_locations).

leadbay_scan_portfolio_signals (bulk portfolio signal scan) + the signal-honesty guardrail across pull_followups / research_lead_by_id / followup_check_in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

milstan

This is a suggestion, not a requirement. See if you want to follow it - and feel free to merge it once you decide what to do (I agree on everything else, and this comment being optional...)

milstan · 2026-06-09T19:13:06Z

+filters them for you. Do NOT loop \`leadbay_research_lead_by_id\` one lead at a
+time, and do NOT guess from list-level freshness flags.
+
+If a lead has no cached signal content, say so honestly — "not yet researched,


You are here giving instructiosn to the user's agent which presumably has access to internet and can maybe also be leverdged to do some extra web serachers and complete the signal Leadbay already has.

You may consider amake it more complex here - telling it to identify where such extra websearches are needed, do them, and refine. Like an extra step of the prompt.

Good call — done in 0918f85. PHASE 3b now turns the coverage gap into a refinement loop instead of just reporting it:

Names the gap precisely — "N matched; K have no cached signal and J more have only a thin/undated mention".

Targeted live pass — if the agent has web-search tools, it researches only the not_researched / thin-signal leads for the exact signal asked about (<Company> acquisition 2025), not the whole portfolio.

Folds findings back in, clearly labelled — live results are shown as agent-sourced (not Leadbay-verified), in a section separate from the campaign-ready cached cohort, with source URL/date cited. They never silently merge into the verified cohort.

Offers the durable path — leadbay_bulk_qualify_leads for gap leads worth persisting, so Leadbay runs its own web_fetch and the signal lands in cached signals[] on the next scan.

The labelling is deliberate: it keeps the #3704 honesty guarantee intact (Leadbay's cached signals[] stay the source of truth) while still answering the question for leads Leadbay hasn't researched yet.

Note: your comment landed on prompts.generated.ts, which is emitted by promptforge — the real edit is in packages/promptforge/prompts/leadbay_followup_check_in.md.tmpl and regenerated from there.

Addresses Milan's PR review: the agent has web tools and can complete the signal Leadbay hasn't cached yet. PHASE 3b now tells the agent to name the coverage gap precisely, run a TARGETED live web pass on only the not_researched / thin-signal leads for the exact signal asked about, and fold findings back in clearly labelled as agent-sourced (not Leadbay-verified) — separate from the campaign-ready cached cohort, with source citations. Also offers leadbay_bulk_qualify_leads as the durable path that writes the signal into Leadbay's cached signals[]. Keeps the #3704 honesty guarantee intact: Leadbay's cached signals[] remain the source of truth; live findings never silently merge into the verified cohort.

…E 3b gap-fill

…b.com-leadbay-product-issues-3704 # Conflicts: # WORKFLOWS.md # packages/mcp/CHANGELOG.md # packages/mcp/package.json # packages/mcp/server.json

Resolve conflicts after main advanced to 0.19.0 (#93 contacts, #95 server-json): - WORKFLOWS.md: keep main's contact rows (15-23) + append new scan_portfolio_signals row (24); renumber. - research_lead_by_id description: trim 50 chars of redundant prose to absorb main's new add_contact anti-trigger (auto-emitted into WHEN TO USE), restoring the 17000-char budget (now 16973). - Regenerate tool-descriptions.generated.ts. Version stays 0.19.1 (main 0.19.0 + patch). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

milstan · 2026-06-10T10:51:47Z

[Claude]: Resolved merge conflicts with main (now 0.19.0; #93 contacts + #95 server-json) and pushed to this branch.

WORKFLOWS.md — kept main's contact rows (15–23), appended scan_portfolio_signals as row 24, renumbered.
research_lead_by_id description — after main's feat(contacts): manage contacts on a company from Claude (add / update / remove / pin / unpin) #93 added an add_contact anti-trigger to its routing (auto-emitted into WHEN TO USE), the generated description hit 17050 vs the 17000 budget. Trimmed 50 chars of redundant prose (companion-tool restatement + the "granular tool stays available" clause); now 16973. The signal-honesty gate and main's anti-trigger are both intact.
Version — resolves to 0.19.1 (= main 0.19.0 + patch), aligned across package.json + both server.json fields; npx pins on @0.19. Note: the body's "0.18.2 → 0.18.3" line is stale (branch was already rebumped to 0.19.1 per the title).

Full gate green on the merged state: core 380, promptforge 16, mcp 376; typecheck + build clean. Squash-merging once CI is green; that fires auto-tag → release.yml for mcp-v0.19.1.

ArtyETH06 added the feature label Jun 8, 2026

ArtyETH06 self-assigned this Jun 8, 2026

ArtyETH06 force-pushed the ArtyETH06/https-github.com-leadbay-product-issues-3704 branch from 3ab7be1 to c5df963 Compare June 9, 2026 16:36

ArtyETH06 and others added 3 commits June 9, 2026 10:07

Merge remote-tracking branch 'origin/main' into ArtyETH06/https-githu…

496d427

…b.com-leadbay-product-issues-3704

chore(release): bump @leadbay/mcp to 0.18.3

bcd6d16

leadbay_scan_portfolio_signals (bulk portfolio signal scan) + the signal-honesty guardrail across pull_followups / research_lead_by_id / followup_check_in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ArtyETH06 changed the title ~~feat(mcp): bulk portfolio signal scan + stale_at honesty fix (product#3704)~~ feat(mcp): leadbay_scan_portfolio_signals — bulk portfolio signal scan + stale_at honesty fix (v0.18.3, product#3704) Jun 9, 2026

ArtyETH06 marked this pull request as ready for review June 9, 2026 17:13

ArtyETH06 requested a review from milstan June 9, 2026 17:24

milstan reviewed Jun 9, 2026

View reviewed changes

ArtyETH06 added 3 commits June 9, 2026 13:24

chore(skills): regenerate leadbay_followup_check_in SKILL.md for PHAS…

28d76ce

…E 3b gap-fill

Merge remote-tracking branch 'origin/main' into ArtyETH06/https-githu…

c999c20

…b.com-leadbay-product-issues-3704 # Conflicts: # WORKFLOWS.md # packages/mcp/CHANGELOG.md # packages/mcp/package.json # packages/mcp/server.json

ArtyETH06 changed the title ~~feat(mcp): leadbay_scan_portfolio_signals — bulk portfolio signal scan + stale_at honesty fix (v0.18.3, product#3704)~~ feat(mcp): leadbay_scan_portfolio_signals — bulk portfolio signal scan + stale_at honesty fix (v0.19.1, product#3704) Jun 9, 2026

milstan merged commit 3eaf4c3 into main Jun 10, 2026
1 check passed

milstan deleted the ArtyETH06/https-github.com-leadbay-product-issues-3704 branch June 10, 2026 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): leadbay_scan_portfolio_signals — bulk portfolio signal scan + stale_at honesty fix (v0.19.1, product#3704)#94

feat(mcp): leadbay_scan_portfolio_signals — bulk portfolio signal scan + stale_at honesty fix (v0.19.1, product#3704)#94
milstan merged 8 commits into
mainfrom
ArtyETH06/https-github.com-leadbay-product-issues-3704

ArtyETH06 commented Jun 8, 2026 •

edited

Loading

Uh oh!

milstan left a comment

Uh oh!

milstan Jun 9, 2026

Uh oh!

ArtyETH06 Jun 9, 2026

Uh oh!

milstan commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArtyETH06 commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pre-Landing Review

Test Coverage

Verification

Adversarial Review

Plan Completion

Uh oh!

milstan left a comment

Choose a reason for hiding this comment

Uh oh!

milstan Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

ArtyETH06 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

milstan commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ArtyETH06 commented Jun 8, 2026 •

edited

Loading