Skip to content

feat(12306): add full read adapter (stations / trains / train / price / me / passengers / orders)#1637

Merged
jackwener merged 4 commits into
jackwener:mainfrom
Benjamin-eecs:feat/12306-public-read
May 18, 2026
Merged

feat(12306): add full read adapter (stations / trains / train / price / me / passengers / orders)#1637
jackwener merged 4 commits into
jackwener:mainfrom
Benjamin-eecs:feat/12306-public-read

Conversation

@Benjamin-eecs
Copy link
Copy Markdown
Contributor

@Benjamin-eecs Benjamin-eecs commented May 17, 2026

Description

Closes #1589 with the full 12306 (中国铁路) adapter the issue proposes. Seven commands across two strategies:

Strategy Command What
PUBLIC (no login) 12306 stations <keyword> Search station bundle by Chinese name, telecode, full pinyin, or short alias
PUBLIC 12306 trains <from> <to> --date YYYY-MM-DD Trains between two stations on a given date with per-seat-class availability
PUBLIC 12306 train <train-no> --from --to --date Full stop list for one train segment
PUBLIC 12306 price <train-no> --from --to --date Ticket prices keyed by seat class for one segment
COOKIE (login) 12306 me Account summary, sensitive fields masked by default
COOKIE 12306 passengers Saved-passenger list, sensitive fields masked
COOKIE 12306 orders In-progress orders (not yet ridden / refunded / completed)

No CAPTCHA / slider / SMS bypass, no credential storage, no ticket sniping, no order submission, no payment - per the issue's Non-goals list.

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🌐 New site adapter
  • 📝 Documentation
  • ♻️ Refactor
  • 🔧 CI / build / tooling

Checklist

  • I ran the checks relevant to this PR
  • I updated tests or docs if needed
  • I included output or screenshots when useful

Implementation notes worth flagging

Anonymous session minting

12306 rejects bare anonymous query endpoints with HTTP 302 -> /mormhweb/logFiles/error.html. The adapter first hits /otn/leftTicket/init to mint JSESSIONID / route / BIGipServerotn cookies, then attaches them to subsequent queries. No CAPTCHA / slider path is invoked. Cookies are not persisted to disk; they live only for the lifetime of the command invocation.

Endpoint name rotation (queryO / queryZ / queryA / queryG / ...)

12306 rotates the train-query endpoint name every few weeks. When the wrong name is hit the server returns {"c_url": "leftTicket/queryG", "c_name": "CLeftTicketUrl", "status": false} pointing to the current correct name. trains walks a list of known names, captures the rotation hint, retries against the suggested name, and mutates the runtime list so subsequent calls in the same process skip the warm-up round trip.

Login detection via document.cookie, not page.getCookies({url})

12306 sets the auth cookie tk and the session cookie JSESSIONID with Path=/otn. CDP Network.getCookies filters by URL path: even with the page navigated to https://kyfw.12306.cn/otn/view/index.html and the user fully logged in, page.getCookies({ url: 'https://kyfw.12306.cn' }) returns 7 cookies without tk / JSESSIONID, because that URL has path /, not /otn. Switched the login check to read document.cookie via page.evaluate, which exposes the navigated page's full visible-cookie set regardless of path. Centralized as require12306Login in utils.js so all three authenticated commands share the same check (and so future maintainers know not to "fix" it back to page.getCookies).

Booking-handshake secret field is stripped

The |-separated train wire record carries a base64 secret token at position 0 used to construct booking handshakes. Since this PR is read-only and the issue rules out booking, parseTrainRecord parses but does not surface that field, and a regression unit test asserts it cannot leak via the adapter contract:

it('does not expose the booking-handshake secret token', () => {
    const row = parseTrainRecord(fields.join('|'), stationByCode);
    expect(Object.values(row)).not.toContain('SECRET_TOKEN_DO_NOT_LEAK');
    expect('secret' in row).toBe(false);
});

Sensitive-field masking on authenticated commands

  • me: real name (Chinese mask, e.g. 张*), email (e.g. xxx@xxx.xxx), mobile (12306 already masks server-side and the adapter preserves that, e.g. xxx-xxx-xxxx), birth date (year only).
  • passengers: passenger name (Chinese mask) + birth year (year only); 12306 already masks ID number (e.g. xxxxxxxxxxxxxxxxxxx) and mobile server-side and this adapter never decodes those.
  • Both expose --include-sensitive to opt back into the unmasked fields the user is entitled to see on their own account.

orders scope

Returns the queryMyOrderNoComplete slice (orders that have not yet been ridden / refunded / completed). The historical queryMyOrderApi endpoint requires extra page-state handshakes that proved fragile when probed - the server returned 302s into the error landing page under repeated anonymous calls. Left as a follow-up so this command can ship reliably for the immediate "what's still on my account" use case.

price cascading

Cascades three anonymous API calls per invocation: init -> queryByTrainNo (to resolve segment station_no within the train route - the price endpoint addresses stops by station_no, not telecode) -> queryTicketPrice. 12306 returns prices keyed by one-or-two-letter seat codes (A9 商务座 / M 一等座 / O 二等座 / WZ 无座 / etc.) and additionally doubles some up as bare numeric codes (e.g. "9": "21580" mirrors "A9": "¥2158.0"); the bare-numeric duplicates are filtered out so the row set is one-per-seat-class.

Station resolution

resolveStation matches Chinese name (上海虹桥), telecode (AOH), full pinyin (shanghaihongqiao), and short alias (shhq). Anything else raises ArgumentError with a hint. Match order is exact-name first to prevent 北京 matching 北京北 by substring.

Strict limit validation

limit arguments use a small inline validator that throws ArgumentError on non-integer / out-of-range input rather than silently clamping. Mirrors the typed-error pattern owner used in #1397 (grok) and #1370 (coupang).

Screenshots / Output

Live verified anonymously and authenticated against kyfw.12306.cn, sleeping 15-25 seconds between hits to keep 12306's anti-abuse throttle gentle. All commands run via node ./dist/src/main.js from this branch's worktree so the local clis/12306/ adapters in this PR are loaded.

Anonymous commands (real API output)

$ node ./dist/src/main.js 12306 stations 上海 --limit 5 -f json
[
  { "name": "练塘",     "code": "LTU", "pinyin": "liantang",         "abbr": "lt",   "city": "上海" },
  { "name": "上海",     "code": "SHH", "pinyin": "shanghai",         "abbr": "sh",   "city": "上海" },
  { "name": "上海南",   "code": "SNH", "pinyin": "shanghainan",      "abbr": "shn",  "city": "上海" },
  { "name": "上海虹桥", "code": "AOH", "pinyin": "shanghaihongqiao", "abbr": "shhq", "city": "上海" },
  { "name": "上海西",   "code": "SXH", "pinyin": "shanghaixi",       "abbr": "shx",  "city": "上海" }
]

$ node ./dist/src/main.js 12306 trains 北京 上海 --date 2026-05-22 --limit 1 -f json
[
  {
    "train_no": "240000G54700", "code": "G547",
    "from_station": "北京南", "to_station": "上海虹桥",
    "from_code": "VNP", "to_code": "AOH",
    "start_time": "06:18", "arrive_time": "12:11", "duration": "05:53",
    "available": true,
    "business_seat": "", "first_seat": "", "second_seat": "",
    "soft_sleeper": "", "hard_sleeper": "", "hard_seat": "", "no_seat": ""
  }
]

$ node ./dist/src/main.js 12306 train 24000000G10L --from 北京南 --to 上海虹桥 --date 2026-05-22 -f json
[
  { "station_no": "01", "station_name": "北京南",   "arrive_time": "",      "start_time": "06:30", "stopover_time": ""      },
  { "station_no": "02", "station_name": "沧州西",   "arrive_time": "07:18", "start_time": "07:20", "stopover_time": "2分钟" },
  { "station_no": "03", "station_name": "德州东",   "arrive_time": "07:45", "start_time": "07:47", "stopover_time": "2分钟" },
  { "station_no": "04", "station_name": "曲阜东",   "arrive_time": "08:34", "start_time": "08:36", "stopover_time": "2分钟" },
  { "station_no": "05", "station_name": "南京南",   "arrive_time": "10:13", "start_time": "10:15", "stopover_time": "2分钟" },
  { "station_no": "06", "station_name": "苏州北",   "arrive_time": "10:59", "start_time": "11:01", "stopover_time": "2分钟" },
  { "station_no": "07", "station_name": "上海虹桥", "arrive_time": "11:24", "start_time": "11:24", "stopover_time": ""      }
]

$ node ./dist/src/main.js 12306 price 24000000G10L --from 北京南 --to 上海虹桥 --date 2026-05-22 -f json
[
  { "seat_code": "A9", "seat_name": "商务座", "price": "2158.0", "currency": "CNY" },
  { "seat_code": "P",  "seat_name": "特等座", "price": "1163.0", "currency": "CNY" },
  { "seat_code": "M",  "seat_name": "一等座", "price": "1035.0", "currency": "CNY" },
  { "seat_code": "WZ", "seat_name": "无座",   "price": "626.0",  "currency": "CNY" },
  { "seat_code": "O",  "seat_name": "二等座", "price": "626.0",  "currency": "CNY" }
]

Authenticated commands (shape-only; real values redacted for this PR body)

The three authenticated commands were exercised live on a Mac mini against a logged-in 12306 account. Output shape below uses synthetic placeholder values; the real PII (12306 username, family name, partial ID prefix, partial mobile, email) is not posted here. The actual run was verified locally and the masking logic was tested against the real response (see "Sensitive-field masking" above).

$ node ./dist/src/main.js 12306 me -f json
[
  {
    "username":   "<account>",
    "real_name":  "<masked>",
    "email":      "<x***x>@<domain>",
    "mobile":     "<3-digits>****<4-digits>",
    "birth_date": "<YYYY>",
    "sex":        "<男|女>",
    "country":    "CHN",
    "user_type":  "成人",
    "member":     false,
    "active":     <bool>
  }
]

$ node ./dist/src/main.js 12306 passengers -f json
[
  {
    "name":           "<surname>*<...>",
    "sex":            "<男|女>",
    "born_year":      "<YYYY>",
    "id_type":        "居民身份证",
    "id_no":          "<4-digits>***********<3-digits>",
    "mobile":         "<3-digits>****<4-digits>",
    "passenger_type": "成人",
    "country":        "CHN"
  },
  ...
]

12306 orders was also exercised live; the account currently has no in-progress orders, so the command correctly surfaces EMPTY_RESULT: No in-progress 12306 orders on this account rather than fabricating an empty array.

If you want a non-redacted copy of the authenticated-command output to confirm masking behaviour, ping me on the PR and I'll DM the raw output.

Test plan

  • npm run build: manifest compiles, 828 entries, git diff cli-manifest.json contains only new 12306 entries (no absolute filesystem paths)
  • npm run check:typed-error-lint: passes (current=173, baseline=173, new=0, resolved=0)
  • npm run check:silent-column-drop: passes (current=102, baseline=102, new=0, resolved=0)
  • npx vitest run --project adapter clis/12306/utils.test.js: 23 tests passing across
    • parseStationBundle: structured-record extraction + skip records without telecode
    • resolveStation: name / telecode / pinyin / short-alias matches + empty / unknown / unknown-but-telecode-shaped failure modes
    • validateDate: format check + impossible calendar dates
    • buildCookieHeader: join Set-Cookie lines
    • parseTrainRecord: full-shape extraction, secret-field non-leak regression, telecode fallback when bundle is missing, short-record null
    • mask helpers: maskEmail (3+ char local-part + edge cases), maskMobile (preserves 12306's own mask), maskChineseName (1 / 2 / 3+ char names)
    • parsePriceData: bare-numeric-duplicate filter, sort by descending price, unknown letter codes fall back to letter as name
  • Live anonymous: stations / trains / train / price against kyfw.12306.cn from a Mac mini session with 15-25 s sleeps between hits
  • Live authenticated: me / passengers / orders after the user logged in to 12306 in the OpenCLI bridge Chrome on Mac mini; cookies surfaced via document.cookie correctly (see require12306Login note above)
  • Reviewer to confirm masking defaults match the project's expectation. The current defaults: real name -> Chinese-mask, email -> local-part mask, mobile -> preserve 12306's own mask, birth date -> year only. --include-sensitive opts in to the unmasked-by-12306 fields.

Out of scope

  • Order history (queryMyOrderApi): needs extra page-state handshakes; left as a follow-up so the immediate "in-progress orders" path can ship without flakiness.
  • Ticket sniping / order submission / payment / CAPTCHA bypass / password storage: explicitly out of scope per the issue's Non-goals.

@Benjamin-eecs Benjamin-eecs changed the title feat(12306): add stations / trains / train read commands (no login required) feat(12306): add full read adapter (stations / trains / train / price / me / passengers / orders) May 17, 2026
@Benjamin-eecs Benjamin-eecs marked this pull request as ready for review May 17, 2026 12:11
Copilot AI review requested due to automatic review settings May 17, 2026 12:11
@Benjamin-eecs Benjamin-eecs force-pushed the feat/12306-public-read branch 2 times, most recently from 9f00a7b to 6f0867e Compare May 17, 2026 12:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Benjamin-eecs Benjamin-eecs force-pushed the feat/12306-public-read branch from 6f0867e to 16ed5da Compare May 17, 2026 12:19
…quired)

Adds a first-pass 12306 (中国铁路) adapter for the public anonymous
query endpoints. Closes the no-login slice of jackwener#1589. The
authenticated `me / passengers / orders` commands the issue
proposes are explicitly left as a follow-up.

Commands:
- 12306 stations <keyword>             search station bundle
- 12306 trains <from> <to> --date YYYY-MM-DD  availability between stations
- 12306 train <train-no> --from <s> --to <s> --date  stop list

All three use Strategy.PUBLIC + browser: false, anonymous, no cookie
storage, no CAPTCHA bypass. Sensitive behaviors the issue rules out
(ticket sniping, order submission, payment, anti-abuse circumvention,
password storage) are not implemented.

Notes worth flagging for review:

- 12306 rejects anonymous query endpoints with HTTP 302 to
  /mormhweb/logFiles/error.html. The adapter first hits
  /otn/leftTicket/init to mint JSESSIONID / route / BIGipServerotn
  cookies, then attaches them to subsequent queries. No CAPTCHA path.

- 12306 rotates the train-query endpoint name (queryO / queryZ /
  queryA / queryG) every few weeks. When the wrong name is hit the
  server returns `{c_url: "leftTicket/queryX", status: false}`
  pointing to the current correct name. The adapter walks a list of
  known names, captures the rotation hint, and retries; the runtime
  list is also mutated so subsequent calls in the same process skip
  the warm-up round trip.

- The `|`-separated train wire format includes a booking-handshake
  `secret` field at position 0. Since this PR is read-only and the
  issue explicitly rules out booking, that field is parsed but not
  surfaced in the returned row, and a unit test asserts it cannot
  leak via the public adapter contract.

- Station resolution accepts Chinese name (`上海虹桥`), telecode
  (`AOH`), full pinyin (`shanghaihongqiao`), or short alias (`shhq`).
  Anything else raises ArgumentError with a hint.

- `limit` arguments use a tight validator that throws ArgumentError
  on non-integer / out-of-range input rather than silently clamping,
  matching the typed-error pattern used in jackwener#1397 (grok) and jackwener#1370
  (coupang).

Live verified anonymously against kyfw.12306.cn:
- `12306 stations 上海 --limit 5` returns 5 stations including
  上海 (SHH) / 上海南 (SNH) / 上海虹桥 (AOH).
- `12306 trains 北京 上海 --date 2026-05-22 --limit 1` returns
  G547 06:18 -> 12:11 with first / second / business / no-seat
  availability columns populated.
- `12306 train 24000000G10L --from 北京南 --to 上海虹桥 --date 2026-05-22`
  returns the 7-stop G1 route from 北京南 through 沧州西 / 德州东 /
  曲阜东 / 南京南 / 苏州北 to 上海虹桥, with arrival / departure /
  stopover times.

Tests: 18 unit tests covering parseStationBundle, resolveStation
(including ambiguous / case-insensitive cases), validateDate,
buildCookieHeader, parseTrainRecord (including a regression test
asserting the `secret` field cannot leak into the row).

Deliberately deferred to a follow-up: `12306 price`. The
queryTicketPrice endpoint needs train_no + per-stop station_no +
per-train seat-type letters, so an ergonomic `12306 price <code>`
would cascade three API calls (trains -> stops -> price) per
invocation. Wanted to keep this PR's blast radius small. If the
maintainer prefers a Phase 1 that includes price even with the
cascading-call cost, happy to add it.
…ce read commands

Completes the jackwener#1589 12306 (中国铁路) adapter on top of the
stations / trains / train slice landed in the prior commit of this
branch. The full command set is now:

  Anonymous (no login):
    12306 stations  search station bundle by Chinese / telecode / pinyin
    12306 trains    list trains between two stations on a date
    12306 train     list stops of one train
    12306 price     ticket prices for one train segment + date

  Authenticated (cookie session):
    12306 me        account summary (sensitive fields masked by default)
    12306 passengers  saved-passenger list (sensitive fields masked)
    12306 orders    in-progress orders (not yet ridden / refunded)

Notes worth flagging for review:

- 12306 sets the auth cookie `tk` and the session cookie `JSESSIONID`
  with `Path=/otn`. CDP `Network.getCookies` filters by URL path, so
  `page.getCookies({ url: 'https://kyfw.12306.cn' })` returns 7
  cookies without `tk` / `JSESSIONID`, even on a freshly-navigated
  logged-in tab. Switched the login check to read `document.cookie`
  via `page.evaluate`, which the current navigated page exposes
  regardless of cookie path. Centralized as `require12306Login` in
  utils.js so all three authenticated commands share the same check.

- All authenticated commands mask sensitive fields by default:
  - `me`: real name (Chinese mask), email, mobile (12306 already
    masks server-side), birth date (year only).
  - `passengers`: name + birth year by default; 12306 already masks
    ID number and mobile server-side and this adapter never decodes
    those.
  - Both expose `--include-sensitive` to opt back into the unmasked
    fields the user is entitled to see on their own account.

- `orders` returns the `queryMyOrderNoComplete` slice (orders that
  have not yet been ridden / refunded / completed). The historical
  `queryMyOrderApi` endpoint requires extra page-state handshakes
  that proved fragile when probed; left as a follow-up so this
  command can ship reliably for the immediate "what's still on my
  account" use case.

- `price` cascades three anonymous API calls per invocation:
  init -> queryByTrainNo (to resolve segment station_no within the
  train route) -> queryTicketPrice. 12306 returns prices keyed by
  one-or-two-letter seat codes (`A9` 商务座 / `M` 一等座 /
  `O` 二等座 / `WZ` 无座 / etc.) and additionally doubles some up
  as bare numeric codes (e.g. `"9": "21580"` mirrors
  `"A9": "¥2158.0"`); the bare-numeric duplicates are filtered out
  so the row set is one-per-seat-class.

- Strictly anonymous queries; no CAPTCHA / slider / SMS bypass, no
  credential storage, no ticket sniping, no order submission, no
  payment - per the issue's Non-goals list.

Live verified anonymously and authenticated against kyfw.12306.cn,
sleeping 15-25 seconds between hits to keep 12306's anti-abuse
throttle gentle:

  - 12306 me: account summary returned with real_name / email /
    mobile / birth date all masked at the adapter level, on top of
    12306's own server-side mobile mask.
  - 12306 passengers: every saved passenger returned with name
    masked to `<surname>*<...>` and 12306-side ID/mobile masks
    preserved verbatim.
  - 12306 orders: empty for this test account (no in-progress
    orders), correct EmptyResultError surface.
  - 12306 price G1 北京南 -> 上海虹桥 2026-05-22: returns
    商务座 ¥2158 / 特等座 ¥1163 / 一等座 ¥1035 / 二等座 ¥626 /
    无座 ¥626, sorted desc.

Tests: 23 unit tests (5 new beyond the prior commit's 18) cover
the mask helpers (email / mobile / Chinese name) plus the
parsePriceData filter that drops the bare-numeric duplicates and
sorts by descending price.
@Benjamin-eecs Benjamin-eecs force-pushed the feat/12306-public-read branch from 16ed5da to f39d347 Compare May 18, 2026 14:17
@jackwener jackwener merged commit e82e32a into jackwener:main May 18, 2026
11 checks passed
huanghe added a commit to huanghe/OpenCLI that referenced this pull request May 18, 2026
* fix(electron-apps): move codex CDP port off 9222 to avoid browser-bridge collision (jackwener#1630)

* fix(electron-apps): move codex CDP port off 9222 to avoid browser-bridge collision

`src/electron-apps.ts` had `codex: { port: 9222 }`, but `9222` is the
default Chrome DevTools port that opencli's own browser-bridge Chrome
binds whenever `opencli doctor` is OK. On every normal opencli install
the bridge owns 9222 first, so Codex Desktop can never bind it, and
`opencli codex status` (plus every other codex command) fails with:

  App launched but CDP not available on port 9222 after 15s

`~/.opencli/apps.yaml` is documented as "additive only, does not
override builtins", so users have no supported way to relocate the
port from the user side.

Reported in jackwener#1626 with full repro (Codex Desktop + active opencli
browser-bridge Chrome) and root-cause pointer at
`dist/src/electron-apps.js:13`. Every other electron app in the
builtin registry already uses a distinct port in the 9224-9236
band (cursor 9226, doubao-app 9225, chatwise 9228, discord-app 9232,
antigravity 9234, chatgpt-app 9236); codex was the only one that
collided with the browser bridge.

Move codex to 9238 (the next free slot in that band, also the value
the reporter recommended). Update the test that asserts the port and
the two docs references that mention codex=9222. The pitfall entry
in `docs/advanced/electron.md` is also annotated to explicitly call
out 9222 as the bridge's port to avoid future collisions.

Closes jackwener#1626.

Verified live: `opencli codex status -v` now emits
`[verbose] [launcher] Probing CDP on port 9238...` (was 9222 before
the fix), confirming the code path picks up the new port. Full
end-to-end with a real Codex Desktop install is left to the reporter
and reviewer; the change here is a single-value config update plus
docs/tests sync.

Unit tests: 7 / 7 in `src/electron-apps.test.ts` pass (the codex-port
assertion updated to 9238). Both audit gates pass.

* docs(electron): sync codex CDP port guidance

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* fix(adapters): drop silent-sentinel row fallbacks across 6 read commands (jackwener#1631)

* fix(adapters): drop silent-sentinel row fallbacks across 6 read commands

Continues the audit-baseline cleanup started in jackwener#1611 (lesswrong) and
the direction set by jackwener#1599 / jackwener#1603 / jackwener#1604. Replaces the
`silent-sentinel` row-data fallbacks (`'Unknown'` / `'-'` / `'unknown'`
that mask missing fields) with the empty-string signal so agents can
tell apart "field really has the value Unknown" from "upstream returned
no value".

Touched 6 read adapters, 10 baseline entries:
- wikipedia/trending: title, description
- 36kr/article: author, date, body
- xiaoyuzhou/download: podcast
- xiaoyuzhou/transcript: podcast
- zhihu/collection: dedup key + type field (the empty prefix still
  produces a unique-per-content dedup key, just without the `unknown:`
  noise)
- zhihu/download: author

Intentionally skipped (line-by-line audited):
- v2ex/me.js: `'Unknown'` is an in-band control-flow sentinel. Line 35
  initialises `let username = 'Unknown';`, line 41 uses
  `if (username === 'Unknown')` to trigger the profileEl fallback
  selector, line 75 uses the same check to raise the auth error.
  Empty would silently bypass both checks and return a row with an
  empty username as if auth succeeded.
- v2ex/daily.js: `'未知'` is user-facing 签到 success text in the
  rendered status message, not a row field. Empty would render a
  broken sentence.
- weibo/comments.js, weibo/feed.js: the sentinel sits inside an in-IIFE
  error-message string composition (`'API error: ' + (data.msg || 'unknown')`),
  not in a returned row. Empty would silently truncate diagnostic
  output. Both stay on baseline.

Verified live: `opencli wikipedia trending --limit 3` and `opencli 36kr
hot --limit 2` both return populated rows; the empty-string signal only
kicks in when the upstream value is actually missing.

* test(adapters): add empty-signal coverage for the cluster-2 sentinel swap

Per owner's pattern in 7164615 (douyin/user-videos.test.js +
jike/read.test.js + weread/search-regression.test.js), pairs the
silent-sentinel value swap in this PR with focused unit tests that
mock the upstream to return null / missing fields and assert the row
surfaces an empty-string signal instead of the old fabricated
'Unknown' / '-' / 'unknown' sentinel.

Coverage:

- clis/wikipedia/trending.test.js (new): mocks wikiFetch to return
  three articles - one with both title + description populated, one
  with no title and no description, one with title only. Asserts the
  missing fields render as '' (was '-' before this PR).

- clis/36kr/article.test.js (new): mocks page.evaluate to return a
  scrape where title is present but author / date / body are empty.
  Asserts those three fields render as '' in the row pair output
  (was '-' before this PR). Also covers the NOT_FOUND and
  INVALID_ARGUMENT error paths that already existed.

- clis/zhihu/collection.test.js (+1 case): mocks the zhihu collection
  API to return an item with content.id but no content.type. Asserts
  type renders as '' (was 'unknown' before this PR); the new dedup
  key prefix is :id rather than unknown:id, semantically identical
  for dedup purposes.

The other three files in this PR (xiaoyuzhou/download,
xiaoyuzhou/transcript, zhihu/download) use the same `|| 'unknown'` ->
`|| ''` value swap with no downstream sentinel consumer. They are
covered by the same JS language semantics the three tests above
demonstrate.

* fix(adapters): fail typed on missing row identity

* fix(adapters): tighten sentinel row identity guards

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* fix(weibo/publish): replace brittle CSS-module hash with placeholder selector (jackwener#1625)

* fix(weibo/publish): replace brittle CSS-module hash with placeholder selector

`clis/weibo/publish.js` matched the compose textarea via
`textarea._input_13iqr_8`, where `_input_13iqr_8` is the Vite CSS-module
hash Weibo rebuilds on every frontend deploy. The hash drifted (current
build emits `_input_1f5hn_8`), so step 4 of the publish flow throws
"Weibo compose editor did not appear" before anything else can run.
Reported in jackwener#1602.

Replace the single hashed selector with a placeholder-text-based chain
that survives Weibo's CSS-module rebuilds:

  textarea[placeholder*="有什么新鲜事"]
  textarea[placeholder*="新鲜事"]
  textarea._input_13iqr_8     // legacy hash kept last for older variants

Two visible textareas can match on the home feed (the always-rendered
"home-strip" prompt + the post-click modal compose). Pick the LAST
visible candidate: the modal opens on top and is appended to DOM later,
so the last-visible textarea is the modal. Both the editor-visibility
poll (Step 4) and the text-insertion step (Step 6) use the same chain.

Also drops `evaluateWithArgs` from Step 8 success polling. The IIFE
there does not reference any outer args, but `evaluateWithArgs` injects
its `const`-bound parameter names into the page context, and re-running
on each iteration of the success-poll loop threw `Identifier
'maxIterations' has already been declared` after the first iteration.
This was masked previously because Step 4 always failed first; with the
selector fixed, the latent Step 8 bug surfaces. Switched to plain
`page.evaluate` to avoid re-declaring per loop.

Closes jackwener#1602.

Verified live on macOS / opencli built locally / extension v1.0.15,
weibo cookie session:
- `opencli weibo publish "明洞那家店真不错"` returned
  `status: success, message: 发布成功, text: 明洞那家店真不错`
- Confirmed via `/ajax/statuses/mymblog`: the post landed at
  `idstr=5299403716821218`, `mblogid=QFHWzsCvE`, text matches what
  was typed (proves selector chain picks the right textarea and the
  text insertion path works end-to-end)
- Cleaned up: deleted via the same `/ajax/statuses/destroy` path that
  PR jackwener#1620 exposes as `weibo delete`

Unit tests: 8 / 8 in `clis/weibo/publish.test.js` pass (mocks updated
to reflect the new `evaluate`-vs-`evaluateWithArgs` split for Step 8
and the longer poll window).

* test(weibo): lock publish placeholder selector path

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* feat(xiaohongshu): add delete-note command to remove published notes (jackwener#1624)

* fix(xiaohongshu/publish): invoke shadow-DOM publish handler directly

XHS creator center now wraps the publish/save-draft button in an
`<xhs-publish-btn>` web component backed by a CLOSED shadow root.
Calling `.click()` on the host element does not dispatch into the
internal handler, and CDP coordinate clicks cannot penetrate the
shadow boundary. The previous text-match `button.click()` loop hit
the host element, returned `ok`, and yet the note silently stayed
on the publish page as a draft, so the adapter reported the soft
`⚠️ 操作完成,请在浏览器中确认` status while nothing was actually
posted.

Invoke the publish/save method directly on the `<xhs-publish-btn>`
host (`_onPublish` / `_onSave` and a few candidate names XHS has
shipped historically). Fall back to the legacy
`<button>`/`[role="button"]` text-match click for older
creator-center variants that still expose plain buttons.

Patch shape suggested by the OpenCLI autofix report in jackwener#1606 from
@chcc-funny (who verified an end-to-end real publish locally).

Closes jackwener#1606.

Verified live on macOS / opencli v1.7.22 / extension v1.0.15,
with creator center logged in:
- `opencli xiaohongshu publish ... --draft` -> `✅ 暂存成功`,
  creator home shows "草稿箱中有未发布的作品"
- `opencli xiaohongshu publish ...` (real publish) -> `✅ 发布成功`,
  note appeared on the account feed (visible from mobile app);
  test note deleted after verification

Unit tests: 12 / 12 in `clis/xiaohongshu/publish.test.js` pass
(mocks updated to reflect the new `{ ok, via, name|text }` invoke
result shape).

* feat(xiaohongshu): add delete-note command to remove published notes

Adds `opencli xiaohongshu delete-note <note-id>` so the workflow that
creates a note can also remove one without leaving the CLI, mirroring
`weibo delete` (jackwener#1619 / jackwener#1620).

The creator-center HTTP delete API requires the `X-S-Common` signature
header that `publish.js` deliberately avoids, so this follows the same
UI automation route. Flow:

  1. Navigate to creator note-manager
  2. Switch to "已发布" tab (delete entry only appears there; "审核中"
     and "未通过" rows have no web delete action, mobile app only)
  3. Locate the `.note` row whose `data-impression` JSON contains the
     target noteId (exact JSON-parsed match, not substring, so values
     that happen to share the noteId prefix in other fields cannot
     match the wrong row)
  4. Click the inline `<span class="control data-del">` action
  5. Click "确定" in the `.d-modal-footer` confirmation modal
  6. Poll for the row disappearing (iteration-bounded so tests with
     mocked `page.wait` exhaust the loop quickly)

Typed errors:
- /login redirect after navigation: AuthRequiredError
- 已发布 tab not found / not clickable: CommandExecutionError (UI drift)
- target noteId not present in the rendered list: EmptyResultError with
  a hint about review-state limitation
- row found but no delete action visible: CommandExecutionError
- confirmation modal missing / no 确定 button: CommandExecutionError
- row still visible after the configured poll window: CommandExecutionError

Closes jackwener#1623.

Verified live: published a test note, deleted via this adapter, follow-up
`xiaohongshu creator-notes` confirms it is gone. Unit tests: 8 / 8 cover
happy path, empty-id ArgumentError, login redirect AuthRequiredError,
tab-not-found CommandExecutionError, row-not-found EmptyResultError,
no-delete-action / no-modal / unverified-delete CommandExecutionError
paths.

Built on top of jackwener#1613 (xiaohongshu publish shadow-DOM fix) so the live
verify could exercise publish-then-delete end to end. Will rebase onto
main once jackwener#1613 lands.

* fix(xhs): make delete-note fail closed

* fix(xiaohongshu): harden delete-note boundary

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* feat(weibo): add delete command to remove user's own posts (jackwener#1620)

* feat(weibo): add delete command to remove user's own posts

Adds `opencli weibo delete <id>` so the same workflow that creates a
post can also remove one without leaving the CLI. The id positional
accepts either the numeric `idstr` (e.g. `5299336218674412`) or the
base62 `mblogid` (e.g. `QFGbHAoBS`) found in any weibo URL or in the
output of `weibo me` / `weibo feed` / `weibo post`.

Implementation lives in a single `page.evaluate` IIFE so cookies +
the XSRF-TOKEN double-submit token stay first-party:

  1. Resolve mblogid / idstr via `GET /ajax/statuses/show?id=<input>`,
     which returns the canonical `idstr`. Empty result -> 404 path.
  2. Read the `XSRF-TOKEN` cookie via `document.cookie`.
  3. `POST /ajax/statuses/destroy` with `id=<idstr>` body and the
     `X-Xsrf-Token` header.
  4. Return `[{ status: 'deleted', id, mblogid }]`.

Typed errors:
- 401 / 403 from either show or destroy -> `AuthRequiredError`
- `show` returning no `idstr` -> `EmptyResultError`
- Non-2xx HTTP on either call -> `CommandExecutionError` with status
- API response `ok !== 1` -> `CommandExecutionError` with the API msg

Closes jackwener#1619.

Verified live on macOS / opencli v1.7.22, weibo cookie session:
- Deleted the lingering test post from jackwener#1602 verification
  (idstr=5299336218674412, mblogid=QFGbHAoBS):
  `weibo delete QFGbHAoBS` returned
  `[{ status: 'deleted', id: '5299336218674412', mblogid: 'QFGbHAoBS' }]`
- `weibo me` shows `statuses: 3` (was 4 before the delete)
- `weibo post QFGbHAoBS` now throws "Post not found"

Unit tests: 8 / 8 in `clis/weibo/delete.test.js` (happy path,
empty-id, auth, not-found, show-http, destroy-http, api-msg, envelope
unwrap). Full weibo suite: 38 / 38 pass.

* fix(weibo): require delete postcondition evidence

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* fix(lesswrong): drop "Unknown" silent sentinel in author column (jackwener#1611)

* fix(lesswrong): drop "Unknown" silent sentinel in author column

Twelve lesswrong commands had `author: item.user?.displayName ?? 'Unknown'`
which masks the missing-author signal: an agent reading the result row
cannot distinguish "post has no associated user" from "author is literally
named Unknown". The repo's typed-error lint flags this pattern
(silent-sentinel rule, see scripts/check-typed-error-lint.mjs:323).

Replace `?? 'Unknown'` with `?? ''` so the missing-author case stays
visible as an empty string. Consistent with `clis/lesswrong/_helpers.js:68`
which was already using the empty-signal form.

Shrinks scripts/typed-error-lint-baseline.json from 173 to 161 entries.

Follows the same direction as jackwener#1603 (fix(adapters): surface silent empty
fallbacks).

Verified live: `opencli lesswrong frontpage --limit 2 -f json` returns
real posts with non-empty author values; empty-author rows would now
show `"author": ""` instead of fabricating `"Unknown"`.

* test(lesswrong): add empty-signal coverage for the author sentinel swap

Per owner's pattern in 7164615 (douyin/user-videos.test.js +
jike/read.test.js + weread/search-regression.test.js), pairs the
silent-sentinel value swap in this PR with a focused unit test that
mocks the upstream LessWrong GraphQL response to return posts where
`user` is null or `user.displayName` is missing, and asserts the row
surfaces `author: ''` instead of the old fabricated `'Unknown'`.

`clis/lesswrong/frontpage.test.js` is representative for the twelve
identical `author: item.user?.displayName ?? ''` swaps across
comments / curated / frontpage / new / read / sequences / shortform /
tag / top / top-month / top-week / top-year, all of which share the
exact same expression with no downstream sentinel consumer.

The empty-signal path is exercised live too: a deleted-account or
permission-restricted user shows up in the GraphQL response with
`user: null`, surfaces as `author: ''` post this PR (was 'Unknown'
before).

* feat(twitter): rewrite download profile path on GraphQL UserMedia with cursor pagination (jackwener#1636)

* fix(twitter): harden profile media download

* fix(twitter): fail closed on repeated media cursor

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* build: restore +x on dist/src/main.js after tsc rebuild (jackwener#1644)

clean-dist deletes dist/ and tsc --build re-emits files without preserving
the executable bit on the bin entry. Symlinked global install then hits
EACCES on spawn until manually chmod'd. Chain a chmodSync into the existing
prebuild-manifest hook so any future rebuild self-heals.

node -e instead of bare `chmod +x` to keep the script portable (npm runs
on Windows via Git Bash where chmod is a no-op, but fs.chmodSync still
silently no-ops there too — no extra branching needed).

Co-authored-by: Kary <karyhe1019@gmail.com>

* feat(weread-official): add official gateway CLI

Add the WeRead official Agent Gateway as an in-tree pure HTTP adapter with 8 commands, typed errors, tests, and docs.

* feat(linkedin): consolidate messaging and Sales Navigator commands (jackwener#1647)

* fix(linkedin): harden sales navigator commands

* fix(linkedin): harden salesnav message boundaries

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* feat(xianyu): add inbox, messages, and reply commands (jackwener#1639)

* fix: tighten internal callback types

* feat(xianyu): add private message commands

* fix(xianyu): harden IM command contracts

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* fix(zhihu): harden search pagination (jackwener#1615)

Co-authored-by: jackwener <jakevingoo@gmail.com>

* fix(browser): goto 重试时回收陈旧 page identity + 把 -32000 "Cannot find default execution context" 归类为可重试 (jackwener#1645)

* fix(browser): recover from stale page identity on goto retry (#5)

When a chrome-backed adapter pre-navigates after its cached `_page`
targetId has been invalidated (tab closed externally, identity evicted),
the extension throws `Page not found: <id> — stale page identity` and
the failure cascades — every subsequent persistent-site session call in
the same process keeps re-sending the same dead targetId.

Observed in a downstream parallel multi-platform recall: a single dead page handle
got reused across 4+ calls (twitter thread / twitter search / reddit search)
because there was no detection or recovery. The same hash appeared in
adapter pre-navigations to youtube, twitter, reddit, xhs back-to-back in
seconds, suggesting the cached `_page` was shared via persistent site
session leases (`site:youtube` etc) and never cleared after the first
"stale page identity" response.

Page.goto() now catches that specific error, drops `_page`, and retries
once without the stale id. The retry navigates via session-lease
resolution in the extension (resolveTab → preferredTabId / new owned tab),
which already handles tab eviction correctly. No effect on the happy path.

Three regression tests in src/browser/page.test.ts cover:
- recovery: stale id dropped, retry succeeds with new identity
- no-cache safety: fresh page with no _page → error propagates unchanged
  (nothing to drop, retrying would loop)
- error scoping: unrelated extension errors (e.g. disconnected) still
  surface immediately — no implicit retry

* fix(errors): classify -32000 "Cannot find default execution context" as retryable (#6)

classifyBrowserError previously only matched CDP -32000 errors when the
message contained "target" (e.g., "target closed"). It missed
"Cannot find default execution context", a CDP protocol error that also
indicates the inspected target went away — observed in a downstream parallel
adapter recall against youtube channels.

Widening the secondary check to `/target|context/i` lets the existing
target-navigation retry path (200ms delay + re-attach) recover instead of
surfacing the error as non-retryable.

* fix(browser): tighten stale page recovery notes

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* fix: keep media filenames in output directory (jackwener#1642)

* fix: keep media filenames in output directory

* fix(download): sanitize media filename segments

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

* feat(12306): add full read adapter (stations / trains / train / price / me / passengers / orders) (jackwener#1637)

* feat(12306): add stations / trains / train read commands (no login required)

Adds a first-pass 12306 (中国铁路) adapter for the public anonymous
query endpoints. Closes the no-login slice of jackwener#1589. The
authenticated `me / passengers / orders` commands the issue
proposes are explicitly left as a follow-up.

Commands:
- 12306 stations <keyword>             search station bundle
- 12306 trains <from> <to> --date YYYY-MM-DD  availability between stations
- 12306 train <train-no> --from <s> --to <s> --date  stop list

All three use Strategy.PUBLIC + browser: false, anonymous, no cookie
storage, no CAPTCHA bypass. Sensitive behaviors the issue rules out
(ticket sniping, order submission, payment, anti-abuse circumvention,
password storage) are not implemented.

Notes worth flagging for review:

- 12306 rejects anonymous query endpoints with HTTP 302 to
  /mormhweb/logFiles/error.html. The adapter first hits
  /otn/leftTicket/init to mint JSESSIONID / route / BIGipServerotn
  cookies, then attaches them to subsequent queries. No CAPTCHA path.

- 12306 rotates the train-query endpoint name (queryO / queryZ /
  queryA / queryG) every few weeks. When the wrong name is hit the
  server returns `{c_url: "leftTicket/queryX", status: false}`
  pointing to the current correct name. The adapter walks a list of
  known names, captures the rotation hint, and retries; the runtime
  list is also mutated so subsequent calls in the same process skip
  the warm-up round trip.

- The `|`-separated train wire format includes a booking-handshake
  `secret` field at position 0. Since this PR is read-only and the
  issue explicitly rules out booking, that field is parsed but not
  surfaced in the returned row, and a unit test asserts it cannot
  leak via the public adapter contract.

- Station resolution accepts Chinese name (`上海虹桥`), telecode
  (`AOH`), full pinyin (`shanghaihongqiao`), or short alias (`shhq`).
  Anything else raises ArgumentError with a hint.

- `limit` arguments use a tight validator that throws ArgumentError
  on non-integer / out-of-range input rather than silently clamping,
  matching the typed-error pattern used in jackwener#1397 (grok) and jackwener#1370
  (coupang).

Live verified anonymously against kyfw.12306.cn:
- `12306 stations 上海 --limit 5` returns 5 stations including
  上海 (SHH) / 上海南 (SNH) / 上海虹桥 (AOH).
- `12306 trains 北京 上海 --date 2026-05-22 --limit 1` returns
  G547 06:18 -> 12:11 with first / second / business / no-seat
  availability columns populated.
- `12306 train 24000000G10L --from 北京南 --to 上海虹桥 --date 2026-05-22`
  returns the 7-stop G1 route from 北京南 through 沧州西 / 德州东 /
  曲阜东 / 南京南 / 苏州北 to 上海虹桥, with arrival / departure /
  stopover times.

Tests: 18 unit tests covering parseStationBundle, resolveStation
(including ambiguous / case-insensitive cases), validateDate,
buildCookieHeader, parseTrainRecord (including a regression test
asserting the `secret` field cannot leak into the row).

Deliberately deferred to a follow-up: `12306 price`. The
queryTicketPrice endpoint needs train_no + per-stop station_no +
per-train seat-type letters, so an ergonomic `12306 price <code>`
would cascade three API calls (trains -> stops -> price) per
invocation. Wanted to keep this PR's blast radius small. If the
maintainer prefers a Phase 1 that includes price even with the
cascading-call cost, happy to add it.

* feat(12306): add me / passengers / orders / price authenticated + price read commands

Completes the jackwener#1589 12306 (中国铁路) adapter on top of the
stations / trains / train slice landed in the prior commit of this
branch. The full command set is now:

  Anonymous (no login):
    12306 stations  search station bundle by Chinese / telecode / pinyin
    12306 trains    list trains between two stations on a date
    12306 train     list stops of one train
    12306 price     ticket prices for one train segment + date

  Authenticated (cookie session):
    12306 me        account summary (sensitive fields masked by default)
    12306 passengers  saved-passenger list (sensitive fields masked)
    12306 orders    in-progress orders (not yet ridden / refunded)

Notes worth flagging for review:

- 12306 sets the auth cookie `tk` and the session cookie `JSESSIONID`
  with `Path=/otn`. CDP `Network.getCookies` filters by URL path, so
  `page.getCookies({ url: 'https://kyfw.12306.cn' })` returns 7
  cookies without `tk` / `JSESSIONID`, even on a freshly-navigated
  logged-in tab. Switched the login check to read `document.cookie`
  via `page.evaluate`, which the current navigated page exposes
  regardless of cookie path. Centralized as `require12306Login` in
  utils.js so all three authenticated commands share the same check.

- All authenticated commands mask sensitive fields by default:
  - `me`: real name (Chinese mask), email, mobile (12306 already
    masks server-side), birth date (year only).
  - `passengers`: name + birth year by default; 12306 already masks
    ID number and mobile server-side and this adapter never decodes
    those.
  - Both expose `--include-sensitive` to opt back into the unmasked
    fields the user is entitled to see on their own account.

- `orders` returns the `queryMyOrderNoComplete` slice (orders that
  have not yet been ridden / refunded / completed). The historical
  `queryMyOrderApi` endpoint requires extra page-state handshakes
  that proved fragile when probed; left as a follow-up so this
  command can ship reliably for the immediate "what's still on my
  account" use case.

- `price` cascades three anonymous API calls per invocation:
  init -> queryByTrainNo (to resolve segment station_no within the
  train route) -> queryTicketPrice. 12306 returns prices keyed by
  one-or-two-letter seat codes (`A9` 商务座 / `M` 一等座 /
  `O` 二等座 / `WZ` 无座 / etc.) and additionally doubles some up
  as bare numeric codes (e.g. `"9": "21580"` mirrors
  `"A9": "¥2158.0"`); the bare-numeric duplicates are filtered out
  so the row set is one-per-seat-class.

- Strictly anonymous queries; no CAPTCHA / slider / SMS bypass, no
  credential storage, no ticket sniping, no order submission, no
  payment - per the issue's Non-goals list.

Live verified anonymously and authenticated against kyfw.12306.cn,
sleeping 15-25 seconds between hits to keep 12306's anti-abuse
throttle gentle:

  - 12306 me: account summary returned with real_name / email /
    mobile / birth date all masked at the adapter level, on top of
    12306's own server-side mobile mask.
  - 12306 passengers: every saved passenger returned with name
    masked to `<surname>*<...>` and 12306-side ID/mobile masks
    preserved verbatim.
  - 12306 orders: empty for this test account (no in-progress
    orders), correct EmptyResultError surface.
  - 12306 price G1 北京南 -> 上海虹桥 2026-05-22: returns
    商务座 ¥2158 / 特等座 ¥1163 / 一等座 ¥1035 / 二等座 ¥626 /
    无座 ¥626, sorted desc.

Tests: 23 unit tests (5 new beyond the prior commit's 18) cover
the mask helpers (email / mobile / Chinese name) plus the
parsePriceData filter that drops the bare-numeric duplicates and
sorts by descending price.

* fix(12306): harden browser auth boundaries

* fix(12306): tighten API drift boundaries

---------

Co-authored-by: jackwener <jakevingoo@gmail.com>

---------

Co-authored-by: Benjamin Liu <benjaminliu.eecs@gmail.com>
Co-authored-by: jackwener <jakevingoo@gmail.com>
Co-authored-by: Kary <karyhe1019@gmail.com>
Co-authored-by: hanzi <96609857+hanzili@users.noreply.github.com>
Co-authored-by: Jun <44310040+jun0315@users.noreply.github.com>
Co-authored-by: lenovobenben <lenovobenben@gmail.com>
Co-authored-by: 陈家名 <chenjiaming@kezaihui.com>
Co-authored-by: ml-scout <ml-scout@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: add 12306 support

3 participants