Skip to content

fix(adapters): drop silent-sentinel row fallbacks across 6 read commands#1631

Open
Benjamin-eecs wants to merge 2 commits into
jackwener:mainfrom
Benjamin-eecs:chore/silent-sentinel-cluster-2
Open

fix(adapters): drop silent-sentinel row fallbacks across 6 read commands#1631
Benjamin-eecs wants to merge 2 commits into
jackwener:mainfrom
Benjamin-eecs:chore/silent-sentinel-cluster-2

Conversation

@Benjamin-eecs
Copy link
Copy Markdown
Contributor

@Benjamin-eecs Benjamin-eecs commented May 17, 2026

Description

Continues the audit-baseline cleanup started in #1611 (lesswrong) and the direction set by #1599 / #1603 / #1604. Replaces the silent-sentinel row-data fallbacks ('Unknown' / '-' / 'unknown' that mask missing fields) with the empty-string signal so agents can tell apart "field really has the value Unknown" from "upstream returned no value".

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🌐 New site adapter
  • 📝 Documentation
  • ♻️ Refactor
  • 🔧 CI / build / tooling

Checklist

  • I ran the checks relevant to this PR
  • I updated tests or docs if needed
  • I included output or screenshots when useful

Fix

Same pattern as #1611: replace flagged sentinel strings with the empty-signal form. Walks the lint baseline (scripts/check-typed-error-lint.mjs:323 regex catches unknown / Unknown / UNKNOWN / N/A / n/a / NA / 未知 / -) and clears 10 entries across 6 read adapters:

Adapter Lines Pattern
wikipedia/trending title, description ?? '-' to ?? ''
36kr/article author, date, body || '-' to || ''
xiaoyuzhou/download podcast || '-' to || ''
xiaoyuzhou/transcript podcast || '-' to || ''
zhihu/collection dedup key + type field || 'unknown' to || ''
zhihu/download author || 'unknown' to || ''

The zhihu/collection change keeps the dedup key unique-per-content; the empty prefix is :${content.id} instead of unknown:${content.id}, semantically identical for dedup.

Intentionally skipped

  • v2ex/me.js:38: 'Unknown' is an in-band control-flow sentinel. Line 35 initialises let username = 'Unknown', line 41 uses if (username === 'Unknown') to trigger the profileEl fallback selector, line 75 uses the same check to raise the auth error. Empty would silently bypass both checks. An earlier commit in this PR attempted the swap then reverted (commit 163b08dd).
  • v2ex/daily.js:93: '未知' sits inside a 签到 success message rendered to the user (当前余额: 未知). Empty would render a broken sentence. Stays on baseline.
  • weibo/comments.js:33, weibo/feed.js:45: 'unknown' sits inside an in-IIFE error-message string composition ('API error: ' + (data.msg || 'unknown')), not in a returned row. Empty would silently truncate diagnostic output. Stay on baseline.

Screenshots / Output

Smoke test on two of the touched commands (read-only, public):

$ opencli wikipedia trending --limit 3 -f json
[
  { "rank": 1, "title": "Iceman_(Drake_album)", "description": "2026 studio album by Drake", "views": 259180 },
  { "rank": 2, "title": "Andy_Burnham", "description": "British politician (born 1970)", "views": 202483 },
  { "rank": 3, "title": "Obsession_(2025_film)", "description": "Horror film by Curry Barker", "views": 199269 }
]

$ opencli 36kr hot --limit 2 -f json
[
  { "rank": 1, "title": "突发:OpenAI大规模重组,总裁Brockman夺权挂帅", "url": "https://36kr.com/p/3811615591292680" },
  { "rank": 2, "title": "Claude为什么早晨8:30催你睡觉?", "url": "https://36kr.com/p/3811413738594050" }
]

These outputs come from the v1.7.22 published code path; populated rows are byte-identical between the published code and this PR. The empty-signal path only fires when the upstream Wikipedia/36kr/xiaoyuzhou/zhihu API returns a null or missing field, which doesn't surface in this sample. That path is a mechanical 1-char fallback swap.

Test plan

  • npm run build: manifest compiles, no path leaks
  • npm run check:typed-error-lint: passes (current=163, baseline=163, new=0, resolved=0)
  • npm run check:silent-column-drop: passes
  • Smoke test: opencli wikipedia trending and opencli 36kr hot still return populated rows (populated-row path is byte-identical to this PR; not a direct exercise of the changed fallback).
  • Downstream sentinel audit: grep -nE "=== ?['\"](-|unknown|Unknown)['\"]" across the 6 touched files returns zero hits, so the swapped value is never consumed as a control-flow marker downstream (this is the audit step that caught the v2ex/me.js regression in the first commit of this PR before push).
  • Reviewer: confirm none of the touched adapters have row-shape tests that asserted the literal '-' / 'unknown' fallback value (a grep for those strings inside *.test.js was clean here, but worth a second look in CI).

Copilot AI review requested due to automatic review settings May 17, 2026 06:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the adapter audit cleanup by replacing silent sentinel row fallbacks with empty-string signals so callers can distinguish missing upstream data from literal values.

Changes:

  • Replaced '-', 'unknown', and 'Unknown' row fallbacks with '' across affected read adapters.
  • Updated the typed-error lint baseline to remove the resolved silent-sentinel entries.
  • Kept existing populated-row behavior unchanged for the touched adapters.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
clis/36kr/article.js Emits empty values for missing author/date/body fields.
clis/v2ex/me.js Changes username fallback handling in profile extraction.
clis/wikipedia/trending.js Emits empty title/description values when missing.
clis/xiaoyuzhou/download.js Emits empty podcast title when missing.
clis/xiaoyuzhou/transcript.js Emits empty podcast title when missing.
clis/zhihu/collection.js Emits empty type values and updates collection dedupe key fallback.
clis/zhihu/download.js Emits empty author value when missing.
scripts/typed-error-lint-baseline.json Removes resolved silent-sentinel baseline entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread clis/v2ex/me.js Outdated
const navLinks = Array.from(document.querySelectorAll('a.top')).map(a => a.textContent?.trim());
if (navLinks.length > 1 && navLinks[0] === '首页') {
username = navLinks[1] || 'Unknown';
username = navLinks[1] || '';
return [{
title,
podcast: ep.podcast?.title || '-',
podcast: ep.podcast?.title || '',
return [{
title: episode.title || 'episode',
podcast: episode.podcast?.title || '-',
podcast: episode.podcast?.title || '',
Comment thread clis/zhihu/collection.js
Comment on lines +59 to +64
return `${content.type || ''}:${content.id || content.url || JSON.stringify(content).slice(0, 80)}`;
}

function mapCollectionItem(item, rank) {
const content = item.content || {};
const type = content.type || 'unknown';
const type = content.type || '';
@Benjamin-eecs Benjamin-eecs force-pushed the chore/silent-sentinel-cluster-2 branch from ad05353 to 163b08d Compare May 17, 2026 06:46
Continues the audit-baseline cleanup started in jackwener#1611 (lesswrong) and
the direction set by jackwener#1599 / jackwener#1603 / jackwener#1604. Replaces the
`silent-sentinel` row-data fallbacks (`'Unknown'` / `'-'` / `'unknown'`
that mask missing fields) with the empty-string signal so agents can
tell apart "field really has the value Unknown" from "upstream returned
no value".

Touched 6 read adapters, 10 baseline entries:
- wikipedia/trending: title, description
- 36kr/article: author, date, body
- xiaoyuzhou/download: podcast
- xiaoyuzhou/transcript: podcast
- zhihu/collection: dedup key + type field (the empty prefix still
  produces a unique-per-content dedup key, just without the `unknown:`
  noise)
- zhihu/download: author

Intentionally skipped (line-by-line audited):
- v2ex/me.js: `'Unknown'` is an in-band control-flow sentinel. Line 35
  initialises `let username = 'Unknown';`, line 41 uses
  `if (username === 'Unknown')` to trigger the profileEl fallback
  selector, line 75 uses the same check to raise the auth error.
  Empty would silently bypass both checks and return a row with an
  empty username as if auth succeeded.
- v2ex/daily.js: `'未知'` is user-facing 签到 success text in the
  rendered status message, not a row field. Empty would render a
  broken sentence.
- weibo/comments.js, weibo/feed.js: the sentinel sits inside an in-IIFE
  error-message string composition (`'API error: ' + (data.msg || 'unknown')`),
  not in a returned row. Empty would silently truncate diagnostic
  output. Both stay on baseline.

Verified live: `opencli wikipedia trending --limit 3` and `opencli 36kr
hot --limit 2` both return populated rows; the empty-string signal only
kicks in when the upstream value is actually missing.
@Benjamin-eecs Benjamin-eecs changed the title fix(adapters): drop silent-sentinel row fallbacks across 7 read commands fix(adapters): drop silent-sentinel row fallbacks across 6 read commands May 17, 2026
@Benjamin-eecs Benjamin-eecs force-pushed the chore/silent-sentinel-cluster-2 branch from 163b08d to f7573aa Compare May 17, 2026 06:47
…swap

Per owner's pattern in 7164615 (douyin/user-videos.test.js +
jike/read.test.js + weread/search-regression.test.js), pairs the
silent-sentinel value swap in this PR with focused unit tests that
mock the upstream to return null / missing fields and assert the row
surfaces an empty-string signal instead of the old fabricated
'Unknown' / '-' / 'unknown' sentinel.

Coverage:

- clis/wikipedia/trending.test.js (new): mocks wikiFetch to return
  three articles - one with both title + description populated, one
  with no title and no description, one with title only. Asserts the
  missing fields render as '' (was '-' before this PR).

- clis/36kr/article.test.js (new): mocks page.evaluate to return a
  scrape where title is present but author / date / body are empty.
  Asserts those three fields render as '' in the row pair output
  (was '-' before this PR). Also covers the NOT_FOUND and
  INVALID_ARGUMENT error paths that already existed.

- clis/zhihu/collection.test.js (+1 case): mocks the zhihu collection
  API to return an item with content.id but no content.type. Asserts
  type renders as '' (was 'unknown' before this PR); the new dedup
  key prefix is :id rather than unknown:id, semantically identical
  for dedup purposes.

The other three files in this PR (xiaoyuzhou/download,
xiaoyuzhou/transcript, zhihu/download) use the same `|| 'unknown'` ->
`|| ''` value swap with no downstream sentinel consumer. They are
covered by the same JS language semantics the three tests above
demonstrate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants