feat(12306): add full read adapter (stations / trains / train / price / me / passengers / orders)#1637
Merged
Conversation
9f00a7b to
6f0867e
Compare
6f0867e to
16ed5da
Compare
…quired) Adds a first-pass 12306 (中国铁路) adapter for the public anonymous query endpoints. Closes the no-login slice of jackwener#1589. The authenticated `me / passengers / orders` commands the issue proposes are explicitly left as a follow-up. Commands: - 12306 stations <keyword> search station bundle - 12306 trains <from> <to> --date YYYY-MM-DD availability between stations - 12306 train <train-no> --from <s> --to <s> --date stop list All three use Strategy.PUBLIC + browser: false, anonymous, no cookie storage, no CAPTCHA bypass. Sensitive behaviors the issue rules out (ticket sniping, order submission, payment, anti-abuse circumvention, password storage) are not implemented. Notes worth flagging for review: - 12306 rejects anonymous query endpoints with HTTP 302 to /mormhweb/logFiles/error.html. The adapter first hits /otn/leftTicket/init to mint JSESSIONID / route / BIGipServerotn cookies, then attaches them to subsequent queries. No CAPTCHA path. - 12306 rotates the train-query endpoint name (queryO / queryZ / queryA / queryG) every few weeks. When the wrong name is hit the server returns `{c_url: "leftTicket/queryX", status: false}` pointing to the current correct name. The adapter walks a list of known names, captures the rotation hint, and retries; the runtime list is also mutated so subsequent calls in the same process skip the warm-up round trip. - The `|`-separated train wire format includes a booking-handshake `secret` field at position 0. Since this PR is read-only and the issue explicitly rules out booking, that field is parsed but not surfaced in the returned row, and a unit test asserts it cannot leak via the public adapter contract. - Station resolution accepts Chinese name (`上海虹桥`), telecode (`AOH`), full pinyin (`shanghaihongqiao`), or short alias (`shhq`). Anything else raises ArgumentError with a hint. - `limit` arguments use a tight validator that throws ArgumentError on non-integer / out-of-range input rather than silently clamping, matching the typed-error pattern used in jackwener#1397 (grok) and jackwener#1370 (coupang). Live verified anonymously against kyfw.12306.cn: - `12306 stations 上海 --limit 5` returns 5 stations including 上海 (SHH) / 上海南 (SNH) / 上海虹桥 (AOH). - `12306 trains 北京 上海 --date 2026-05-22 --limit 1` returns G547 06:18 -> 12:11 with first / second / business / no-seat availability columns populated. - `12306 train 24000000G10L --from 北京南 --to 上海虹桥 --date 2026-05-22` returns the 7-stop G1 route from 北京南 through 沧州西 / 德州东 / 曲阜东 / 南京南 / 苏州北 to 上海虹桥, with arrival / departure / stopover times. Tests: 18 unit tests covering parseStationBundle, resolveStation (including ambiguous / case-insensitive cases), validateDate, buildCookieHeader, parseTrainRecord (including a regression test asserting the `secret` field cannot leak into the row). Deliberately deferred to a follow-up: `12306 price`. The queryTicketPrice endpoint needs train_no + per-stop station_no + per-train seat-type letters, so an ergonomic `12306 price <code>` would cascade three API calls (trains -> stops -> price) per invocation. Wanted to keep this PR's blast radius small. If the maintainer prefers a Phase 1 that includes price even with the cascading-call cost, happy to add it.
…ce read commands Completes the jackwener#1589 12306 (中国铁路) adapter on top of the stations / trains / train slice landed in the prior commit of this branch. The full command set is now: Anonymous (no login): 12306 stations search station bundle by Chinese / telecode / pinyin 12306 trains list trains between two stations on a date 12306 train list stops of one train 12306 price ticket prices for one train segment + date Authenticated (cookie session): 12306 me account summary (sensitive fields masked by default) 12306 passengers saved-passenger list (sensitive fields masked) 12306 orders in-progress orders (not yet ridden / refunded) Notes worth flagging for review: - 12306 sets the auth cookie `tk` and the session cookie `JSESSIONID` with `Path=/otn`. CDP `Network.getCookies` filters by URL path, so `page.getCookies({ url: 'https://kyfw.12306.cn' })` returns 7 cookies without `tk` / `JSESSIONID`, even on a freshly-navigated logged-in tab. Switched the login check to read `document.cookie` via `page.evaluate`, which the current navigated page exposes regardless of cookie path. Centralized as `require12306Login` in utils.js so all three authenticated commands share the same check. - All authenticated commands mask sensitive fields by default: - `me`: real name (Chinese mask), email, mobile (12306 already masks server-side), birth date (year only). - `passengers`: name + birth year by default; 12306 already masks ID number and mobile server-side and this adapter never decodes those. - Both expose `--include-sensitive` to opt back into the unmasked fields the user is entitled to see on their own account. - `orders` returns the `queryMyOrderNoComplete` slice (orders that have not yet been ridden / refunded / completed). The historical `queryMyOrderApi` endpoint requires extra page-state handshakes that proved fragile when probed; left as a follow-up so this command can ship reliably for the immediate "what's still on my account" use case. - `price` cascades three anonymous API calls per invocation: init -> queryByTrainNo (to resolve segment station_no within the train route) -> queryTicketPrice. 12306 returns prices keyed by one-or-two-letter seat codes (`A9` 商务座 / `M` 一等座 / `O` 二等座 / `WZ` 无座 / etc.) and additionally doubles some up as bare numeric codes (e.g. `"9": "21580"` mirrors `"A9": "¥2158.0"`); the bare-numeric duplicates are filtered out so the row set is one-per-seat-class. - Strictly anonymous queries; no CAPTCHA / slider / SMS bypass, no credential storage, no ticket sniping, no order submission, no payment - per the issue's Non-goals list. Live verified anonymously and authenticated against kyfw.12306.cn, sleeping 15-25 seconds between hits to keep 12306's anti-abuse throttle gentle: - 12306 me: account summary returned with real_name / email / mobile / birth date all masked at the adapter level, on top of 12306's own server-side mobile mask. - 12306 passengers: every saved passenger returned with name masked to `<surname>*<...>` and 12306-side ID/mobile masks preserved verbatim. - 12306 orders: empty for this test account (no in-progress orders), correct EmptyResultError surface. - 12306 price G1 北京南 -> 上海虹桥 2026-05-22: returns 商务座 ¥2158 / 特等座 ¥1163 / 一等座 ¥1035 / 二等座 ¥626 / 无座 ¥626, sorted desc. Tests: 23 unit tests (5 new beyond the prior commit's 18) cover the mask helpers (email / mobile / Chinese name) plus the parsePriceData filter that drops the bare-numeric duplicates and sorts by descending price.
16ed5da to
f39d347
Compare
huanghe
added a commit
to huanghe/OpenCLI
that referenced
this pull request
May 18, 2026
* fix(electron-apps): move codex CDP port off 9222 to avoid browser-bridge collision (jackwener#1630) * fix(electron-apps): move codex CDP port off 9222 to avoid browser-bridge collision `src/electron-apps.ts` had `codex: { port: 9222 }`, but `9222` is the default Chrome DevTools port that opencli's own browser-bridge Chrome binds whenever `opencli doctor` is OK. On every normal opencli install the bridge owns 9222 first, so Codex Desktop can never bind it, and `opencli codex status` (plus every other codex command) fails with: App launched but CDP not available on port 9222 after 15s `~/.opencli/apps.yaml` is documented as "additive only, does not override builtins", so users have no supported way to relocate the port from the user side. Reported in jackwener#1626 with full repro (Codex Desktop + active opencli browser-bridge Chrome) and root-cause pointer at `dist/src/electron-apps.js:13`. Every other electron app in the builtin registry already uses a distinct port in the 9224-9236 band (cursor 9226, doubao-app 9225, chatwise 9228, discord-app 9232, antigravity 9234, chatgpt-app 9236); codex was the only one that collided with the browser bridge. Move codex to 9238 (the next free slot in that band, also the value the reporter recommended). Update the test that asserts the port and the two docs references that mention codex=9222. The pitfall entry in `docs/advanced/electron.md` is also annotated to explicitly call out 9222 as the bridge's port to avoid future collisions. Closes jackwener#1626. Verified live: `opencli codex status -v` now emits `[verbose] [launcher] Probing CDP on port 9238...` (was 9222 before the fix), confirming the code path picks up the new port. Full end-to-end with a real Codex Desktop install is left to the reporter and reviewer; the change here is a single-value config update plus docs/tests sync. Unit tests: 7 / 7 in `src/electron-apps.test.ts` pass (the codex-port assertion updated to 9238). Both audit gates pass. * docs(electron): sync codex CDP port guidance --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * fix(adapters): drop silent-sentinel row fallbacks across 6 read commands (jackwener#1631) * fix(adapters): drop silent-sentinel row fallbacks across 6 read commands Continues the audit-baseline cleanup started in jackwener#1611 (lesswrong) and the direction set by jackwener#1599 / jackwener#1603 / jackwener#1604. Replaces the `silent-sentinel` row-data fallbacks (`'Unknown'` / `'-'` / `'unknown'` that mask missing fields) with the empty-string signal so agents can tell apart "field really has the value Unknown" from "upstream returned no value". Touched 6 read adapters, 10 baseline entries: - wikipedia/trending: title, description - 36kr/article: author, date, body - xiaoyuzhou/download: podcast - xiaoyuzhou/transcript: podcast - zhihu/collection: dedup key + type field (the empty prefix still produces a unique-per-content dedup key, just without the `unknown:` noise) - zhihu/download: author Intentionally skipped (line-by-line audited): - v2ex/me.js: `'Unknown'` is an in-band control-flow sentinel. Line 35 initialises `let username = 'Unknown';`, line 41 uses `if (username === 'Unknown')` to trigger the profileEl fallback selector, line 75 uses the same check to raise the auth error. Empty would silently bypass both checks and return a row with an empty username as if auth succeeded. - v2ex/daily.js: `'未知'` is user-facing 签到 success text in the rendered status message, not a row field. Empty would render a broken sentence. - weibo/comments.js, weibo/feed.js: the sentinel sits inside an in-IIFE error-message string composition (`'API error: ' + (data.msg || 'unknown')`), not in a returned row. Empty would silently truncate diagnostic output. Both stay on baseline. Verified live: `opencli wikipedia trending --limit 3` and `opencli 36kr hot --limit 2` both return populated rows; the empty-string signal only kicks in when the upstream value is actually missing. * test(adapters): add empty-signal coverage for the cluster-2 sentinel swap Per owner's pattern in 7164615 (douyin/user-videos.test.js + jike/read.test.js + weread/search-regression.test.js), pairs the silent-sentinel value swap in this PR with focused unit tests that mock the upstream to return null / missing fields and assert the row surfaces an empty-string signal instead of the old fabricated 'Unknown' / '-' / 'unknown' sentinel. Coverage: - clis/wikipedia/trending.test.js (new): mocks wikiFetch to return three articles - one with both title + description populated, one with no title and no description, one with title only. Asserts the missing fields render as '' (was '-' before this PR). - clis/36kr/article.test.js (new): mocks page.evaluate to return a scrape where title is present but author / date / body are empty. Asserts those three fields render as '' in the row pair output (was '-' before this PR). Also covers the NOT_FOUND and INVALID_ARGUMENT error paths that already existed. - clis/zhihu/collection.test.js (+1 case): mocks the zhihu collection API to return an item with content.id but no content.type. Asserts type renders as '' (was 'unknown' before this PR); the new dedup key prefix is :id rather than unknown:id, semantically identical for dedup purposes. The other three files in this PR (xiaoyuzhou/download, xiaoyuzhou/transcript, zhihu/download) use the same `|| 'unknown'` -> `|| ''` value swap with no downstream sentinel consumer. They are covered by the same JS language semantics the three tests above demonstrate. * fix(adapters): fail typed on missing row identity * fix(adapters): tighten sentinel row identity guards --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * fix(weibo/publish): replace brittle CSS-module hash with placeholder selector (jackwener#1625) * fix(weibo/publish): replace brittle CSS-module hash with placeholder selector `clis/weibo/publish.js` matched the compose textarea via `textarea._input_13iqr_8`, where `_input_13iqr_8` is the Vite CSS-module hash Weibo rebuilds on every frontend deploy. The hash drifted (current build emits `_input_1f5hn_8`), so step 4 of the publish flow throws "Weibo compose editor did not appear" before anything else can run. Reported in jackwener#1602. Replace the single hashed selector with a placeholder-text-based chain that survives Weibo's CSS-module rebuilds: textarea[placeholder*="有什么新鲜事"] textarea[placeholder*="新鲜事"] textarea._input_13iqr_8 // legacy hash kept last for older variants Two visible textareas can match on the home feed (the always-rendered "home-strip" prompt + the post-click modal compose). Pick the LAST visible candidate: the modal opens on top and is appended to DOM later, so the last-visible textarea is the modal. Both the editor-visibility poll (Step 4) and the text-insertion step (Step 6) use the same chain. Also drops `evaluateWithArgs` from Step 8 success polling. The IIFE there does not reference any outer args, but `evaluateWithArgs` injects its `const`-bound parameter names into the page context, and re-running on each iteration of the success-poll loop threw `Identifier 'maxIterations' has already been declared` after the first iteration. This was masked previously because Step 4 always failed first; with the selector fixed, the latent Step 8 bug surfaces. Switched to plain `page.evaluate` to avoid re-declaring per loop. Closes jackwener#1602. Verified live on macOS / opencli built locally / extension v1.0.15, weibo cookie session: - `opencli weibo publish "明洞那家店真不错"` returned `status: success, message: 发布成功, text: 明洞那家店真不错` - Confirmed via `/ajax/statuses/mymblog`: the post landed at `idstr=5299403716821218`, `mblogid=QFHWzsCvE`, text matches what was typed (proves selector chain picks the right textarea and the text insertion path works end-to-end) - Cleaned up: deleted via the same `/ajax/statuses/destroy` path that PR jackwener#1620 exposes as `weibo delete` Unit tests: 8 / 8 in `clis/weibo/publish.test.js` pass (mocks updated to reflect the new `evaluate`-vs-`evaluateWithArgs` split for Step 8 and the longer poll window). * test(weibo): lock publish placeholder selector path --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * feat(xiaohongshu): add delete-note command to remove published notes (jackwener#1624) * fix(xiaohongshu/publish): invoke shadow-DOM publish handler directly XHS creator center now wraps the publish/save-draft button in an `<xhs-publish-btn>` web component backed by a CLOSED shadow root. Calling `.click()` on the host element does not dispatch into the internal handler, and CDP coordinate clicks cannot penetrate the shadow boundary. The previous text-match `button.click()` loop hit the host element, returned `ok`, and yet the note silently stayed on the publish page as a draft, so the adapter reported the soft `⚠️ 操作完成,请在浏览器中确认` status while nothing was actually posted. Invoke the publish/save method directly on the `<xhs-publish-btn>` host (`_onPublish` / `_onSave` and a few candidate names XHS has shipped historically). Fall back to the legacy `<button>`/`[role="button"]` text-match click for older creator-center variants that still expose plain buttons. Patch shape suggested by the OpenCLI autofix report in jackwener#1606 from @chcc-funny (who verified an end-to-end real publish locally). Closes jackwener#1606. Verified live on macOS / opencli v1.7.22 / extension v1.0.15, with creator center logged in: - `opencli xiaohongshu publish ... --draft` -> `✅ 暂存成功`, creator home shows "草稿箱中有未发布的作品" - `opencli xiaohongshu publish ...` (real publish) -> `✅ 发布成功`, note appeared on the account feed (visible from mobile app); test note deleted after verification Unit tests: 12 / 12 in `clis/xiaohongshu/publish.test.js` pass (mocks updated to reflect the new `{ ok, via, name|text }` invoke result shape). * feat(xiaohongshu): add delete-note command to remove published notes Adds `opencli xiaohongshu delete-note <note-id>` so the workflow that creates a note can also remove one without leaving the CLI, mirroring `weibo delete` (jackwener#1619 / jackwener#1620). The creator-center HTTP delete API requires the `X-S-Common` signature header that `publish.js` deliberately avoids, so this follows the same UI automation route. Flow: 1. Navigate to creator note-manager 2. Switch to "已发布" tab (delete entry only appears there; "审核中" and "未通过" rows have no web delete action, mobile app only) 3. Locate the `.note` row whose `data-impression` JSON contains the target noteId (exact JSON-parsed match, not substring, so values that happen to share the noteId prefix in other fields cannot match the wrong row) 4. Click the inline `<span class="control data-del">` action 5. Click "确定" in the `.d-modal-footer` confirmation modal 6. Poll for the row disappearing (iteration-bounded so tests with mocked `page.wait` exhaust the loop quickly) Typed errors: - /login redirect after navigation: AuthRequiredError - 已发布 tab not found / not clickable: CommandExecutionError (UI drift) - target noteId not present in the rendered list: EmptyResultError with a hint about review-state limitation - row found but no delete action visible: CommandExecutionError - confirmation modal missing / no 确定 button: CommandExecutionError - row still visible after the configured poll window: CommandExecutionError Closes jackwener#1623. Verified live: published a test note, deleted via this adapter, follow-up `xiaohongshu creator-notes` confirms it is gone. Unit tests: 8 / 8 cover happy path, empty-id ArgumentError, login redirect AuthRequiredError, tab-not-found CommandExecutionError, row-not-found EmptyResultError, no-delete-action / no-modal / unverified-delete CommandExecutionError paths. Built on top of jackwener#1613 (xiaohongshu publish shadow-DOM fix) so the live verify could exercise publish-then-delete end to end. Will rebase onto main once jackwener#1613 lands. * fix(xhs): make delete-note fail closed * fix(xiaohongshu): harden delete-note boundary --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * feat(weibo): add delete command to remove user's own posts (jackwener#1620) * feat(weibo): add delete command to remove user's own posts Adds `opencli weibo delete <id>` so the same workflow that creates a post can also remove one without leaving the CLI. The id positional accepts either the numeric `idstr` (e.g. `5299336218674412`) or the base62 `mblogid` (e.g. `QFGbHAoBS`) found in any weibo URL or in the output of `weibo me` / `weibo feed` / `weibo post`. Implementation lives in a single `page.evaluate` IIFE so cookies + the XSRF-TOKEN double-submit token stay first-party: 1. Resolve mblogid / idstr via `GET /ajax/statuses/show?id=<input>`, which returns the canonical `idstr`. Empty result -> 404 path. 2. Read the `XSRF-TOKEN` cookie via `document.cookie`. 3. `POST /ajax/statuses/destroy` with `id=<idstr>` body and the `X-Xsrf-Token` header. 4. Return `[{ status: 'deleted', id, mblogid }]`. Typed errors: - 401 / 403 from either show or destroy -> `AuthRequiredError` - `show` returning no `idstr` -> `EmptyResultError` - Non-2xx HTTP on either call -> `CommandExecutionError` with status - API response `ok !== 1` -> `CommandExecutionError` with the API msg Closes jackwener#1619. Verified live on macOS / opencli v1.7.22, weibo cookie session: - Deleted the lingering test post from jackwener#1602 verification (idstr=5299336218674412, mblogid=QFGbHAoBS): `weibo delete QFGbHAoBS` returned `[{ status: 'deleted', id: '5299336218674412', mblogid: 'QFGbHAoBS' }]` - `weibo me` shows `statuses: 3` (was 4 before the delete) - `weibo post QFGbHAoBS` now throws "Post not found" Unit tests: 8 / 8 in `clis/weibo/delete.test.js` (happy path, empty-id, auth, not-found, show-http, destroy-http, api-msg, envelope unwrap). Full weibo suite: 38 / 38 pass. * fix(weibo): require delete postcondition evidence --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * fix(lesswrong): drop "Unknown" silent sentinel in author column (jackwener#1611) * fix(lesswrong): drop "Unknown" silent sentinel in author column Twelve lesswrong commands had `author: item.user?.displayName ?? 'Unknown'` which masks the missing-author signal: an agent reading the result row cannot distinguish "post has no associated user" from "author is literally named Unknown". The repo's typed-error lint flags this pattern (silent-sentinel rule, see scripts/check-typed-error-lint.mjs:323). Replace `?? 'Unknown'` with `?? ''` so the missing-author case stays visible as an empty string. Consistent with `clis/lesswrong/_helpers.js:68` which was already using the empty-signal form. Shrinks scripts/typed-error-lint-baseline.json from 173 to 161 entries. Follows the same direction as jackwener#1603 (fix(adapters): surface silent empty fallbacks). Verified live: `opencli lesswrong frontpage --limit 2 -f json` returns real posts with non-empty author values; empty-author rows would now show `"author": ""` instead of fabricating `"Unknown"`. * test(lesswrong): add empty-signal coverage for the author sentinel swap Per owner's pattern in 7164615 (douyin/user-videos.test.js + jike/read.test.js + weread/search-regression.test.js), pairs the silent-sentinel value swap in this PR with a focused unit test that mocks the upstream LessWrong GraphQL response to return posts where `user` is null or `user.displayName` is missing, and asserts the row surfaces `author: ''` instead of the old fabricated `'Unknown'`. `clis/lesswrong/frontpage.test.js` is representative for the twelve identical `author: item.user?.displayName ?? ''` swaps across comments / curated / frontpage / new / read / sequences / shortform / tag / top / top-month / top-week / top-year, all of which share the exact same expression with no downstream sentinel consumer. The empty-signal path is exercised live too: a deleted-account or permission-restricted user shows up in the GraphQL response with `user: null`, surfaces as `author: ''` post this PR (was 'Unknown' before). * feat(twitter): rewrite download profile path on GraphQL UserMedia with cursor pagination (jackwener#1636) * fix(twitter): harden profile media download * fix(twitter): fail closed on repeated media cursor --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * build: restore +x on dist/src/main.js after tsc rebuild (jackwener#1644) clean-dist deletes dist/ and tsc --build re-emits files without preserving the executable bit on the bin entry. Symlinked global install then hits EACCES on spawn until manually chmod'd. Chain a chmodSync into the existing prebuild-manifest hook so any future rebuild self-heals. node -e instead of bare `chmod +x` to keep the script portable (npm runs on Windows via Git Bash where chmod is a no-op, but fs.chmodSync still silently no-ops there too — no extra branching needed). Co-authored-by: Kary <karyhe1019@gmail.com> * feat(weread-official): add official gateway CLI Add the WeRead official Agent Gateway as an in-tree pure HTTP adapter with 8 commands, typed errors, tests, and docs. * feat(linkedin): consolidate messaging and Sales Navigator commands (jackwener#1647) * fix(linkedin): harden sales navigator commands * fix(linkedin): harden salesnav message boundaries --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * feat(xianyu): add inbox, messages, and reply commands (jackwener#1639) * fix: tighten internal callback types * feat(xianyu): add private message commands * fix(xianyu): harden IM command contracts --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * fix(zhihu): harden search pagination (jackwener#1615) Co-authored-by: jackwener <jakevingoo@gmail.com> * fix(browser): goto 重试时回收陈旧 page identity + 把 -32000 "Cannot find default execution context" 归类为可重试 (jackwener#1645) * fix(browser): recover from stale page identity on goto retry (#5) When a chrome-backed adapter pre-navigates after its cached `_page` targetId has been invalidated (tab closed externally, identity evicted), the extension throws `Page not found: <id> — stale page identity` and the failure cascades — every subsequent persistent-site session call in the same process keeps re-sending the same dead targetId. Observed in a downstream parallel multi-platform recall: a single dead page handle got reused across 4+ calls (twitter thread / twitter search / reddit search) because there was no detection or recovery. The same hash appeared in adapter pre-navigations to youtube, twitter, reddit, xhs back-to-back in seconds, suggesting the cached `_page` was shared via persistent site session leases (`site:youtube` etc) and never cleared after the first "stale page identity" response. Page.goto() now catches that specific error, drops `_page`, and retries once without the stale id. The retry navigates via session-lease resolution in the extension (resolveTab → preferredTabId / new owned tab), which already handles tab eviction correctly. No effect on the happy path. Three regression tests in src/browser/page.test.ts cover: - recovery: stale id dropped, retry succeeds with new identity - no-cache safety: fresh page with no _page → error propagates unchanged (nothing to drop, retrying would loop) - error scoping: unrelated extension errors (e.g. disconnected) still surface immediately — no implicit retry * fix(errors): classify -32000 "Cannot find default execution context" as retryable (#6) classifyBrowserError previously only matched CDP -32000 errors when the message contained "target" (e.g., "target closed"). It missed "Cannot find default execution context", a CDP protocol error that also indicates the inspected target went away — observed in a downstream parallel adapter recall against youtube channels. Widening the secondary check to `/target|context/i` lets the existing target-navigation retry path (200ms delay + re-attach) recover instead of surfacing the error as non-retryable. * fix(browser): tighten stale page recovery notes --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * fix: keep media filenames in output directory (jackwener#1642) * fix: keep media filenames in output directory * fix(download): sanitize media filename segments --------- Co-authored-by: jackwener <jakevingoo@gmail.com> * feat(12306): add full read adapter (stations / trains / train / price / me / passengers / orders) (jackwener#1637) * feat(12306): add stations / trains / train read commands (no login required) Adds a first-pass 12306 (中国铁路) adapter for the public anonymous query endpoints. Closes the no-login slice of jackwener#1589. The authenticated `me / passengers / orders` commands the issue proposes are explicitly left as a follow-up. Commands: - 12306 stations <keyword> search station bundle - 12306 trains <from> <to> --date YYYY-MM-DD availability between stations - 12306 train <train-no> --from <s> --to <s> --date stop list All three use Strategy.PUBLIC + browser: false, anonymous, no cookie storage, no CAPTCHA bypass. Sensitive behaviors the issue rules out (ticket sniping, order submission, payment, anti-abuse circumvention, password storage) are not implemented. Notes worth flagging for review: - 12306 rejects anonymous query endpoints with HTTP 302 to /mormhweb/logFiles/error.html. The adapter first hits /otn/leftTicket/init to mint JSESSIONID / route / BIGipServerotn cookies, then attaches them to subsequent queries. No CAPTCHA path. - 12306 rotates the train-query endpoint name (queryO / queryZ / queryA / queryG) every few weeks. When the wrong name is hit the server returns `{c_url: "leftTicket/queryX", status: false}` pointing to the current correct name. The adapter walks a list of known names, captures the rotation hint, and retries; the runtime list is also mutated so subsequent calls in the same process skip the warm-up round trip. - The `|`-separated train wire format includes a booking-handshake `secret` field at position 0. Since this PR is read-only and the issue explicitly rules out booking, that field is parsed but not surfaced in the returned row, and a unit test asserts it cannot leak via the public adapter contract. - Station resolution accepts Chinese name (`上海虹桥`), telecode (`AOH`), full pinyin (`shanghaihongqiao`), or short alias (`shhq`). Anything else raises ArgumentError with a hint. - `limit` arguments use a tight validator that throws ArgumentError on non-integer / out-of-range input rather than silently clamping, matching the typed-error pattern used in jackwener#1397 (grok) and jackwener#1370 (coupang). Live verified anonymously against kyfw.12306.cn: - `12306 stations 上海 --limit 5` returns 5 stations including 上海 (SHH) / 上海南 (SNH) / 上海虹桥 (AOH). - `12306 trains 北京 上海 --date 2026-05-22 --limit 1` returns G547 06:18 -> 12:11 with first / second / business / no-seat availability columns populated. - `12306 train 24000000G10L --from 北京南 --to 上海虹桥 --date 2026-05-22` returns the 7-stop G1 route from 北京南 through 沧州西 / 德州东 / 曲阜东 / 南京南 / 苏州北 to 上海虹桥, with arrival / departure / stopover times. Tests: 18 unit tests covering parseStationBundle, resolveStation (including ambiguous / case-insensitive cases), validateDate, buildCookieHeader, parseTrainRecord (including a regression test asserting the `secret` field cannot leak into the row). Deliberately deferred to a follow-up: `12306 price`. The queryTicketPrice endpoint needs train_no + per-stop station_no + per-train seat-type letters, so an ergonomic `12306 price <code>` would cascade three API calls (trains -> stops -> price) per invocation. Wanted to keep this PR's blast radius small. If the maintainer prefers a Phase 1 that includes price even with the cascading-call cost, happy to add it. * feat(12306): add me / passengers / orders / price authenticated + price read commands Completes the jackwener#1589 12306 (中国铁路) adapter on top of the stations / trains / train slice landed in the prior commit of this branch. The full command set is now: Anonymous (no login): 12306 stations search station bundle by Chinese / telecode / pinyin 12306 trains list trains between two stations on a date 12306 train list stops of one train 12306 price ticket prices for one train segment + date Authenticated (cookie session): 12306 me account summary (sensitive fields masked by default) 12306 passengers saved-passenger list (sensitive fields masked) 12306 orders in-progress orders (not yet ridden / refunded) Notes worth flagging for review: - 12306 sets the auth cookie `tk` and the session cookie `JSESSIONID` with `Path=/otn`. CDP `Network.getCookies` filters by URL path, so `page.getCookies({ url: 'https://kyfw.12306.cn' })` returns 7 cookies without `tk` / `JSESSIONID`, even on a freshly-navigated logged-in tab. Switched the login check to read `document.cookie` via `page.evaluate`, which the current navigated page exposes regardless of cookie path. Centralized as `require12306Login` in utils.js so all three authenticated commands share the same check. - All authenticated commands mask sensitive fields by default: - `me`: real name (Chinese mask), email, mobile (12306 already masks server-side), birth date (year only). - `passengers`: name + birth year by default; 12306 already masks ID number and mobile server-side and this adapter never decodes those. - Both expose `--include-sensitive` to opt back into the unmasked fields the user is entitled to see on their own account. - `orders` returns the `queryMyOrderNoComplete` slice (orders that have not yet been ridden / refunded / completed). The historical `queryMyOrderApi` endpoint requires extra page-state handshakes that proved fragile when probed; left as a follow-up so this command can ship reliably for the immediate "what's still on my account" use case. - `price` cascades three anonymous API calls per invocation: init -> queryByTrainNo (to resolve segment station_no within the train route) -> queryTicketPrice. 12306 returns prices keyed by one-or-two-letter seat codes (`A9` 商务座 / `M` 一等座 / `O` 二等座 / `WZ` 无座 / etc.) and additionally doubles some up as bare numeric codes (e.g. `"9": "21580"` mirrors `"A9": "¥2158.0"`); the bare-numeric duplicates are filtered out so the row set is one-per-seat-class. - Strictly anonymous queries; no CAPTCHA / slider / SMS bypass, no credential storage, no ticket sniping, no order submission, no payment - per the issue's Non-goals list. Live verified anonymously and authenticated against kyfw.12306.cn, sleeping 15-25 seconds between hits to keep 12306's anti-abuse throttle gentle: - 12306 me: account summary returned with real_name / email / mobile / birth date all masked at the adapter level, on top of 12306's own server-side mobile mask. - 12306 passengers: every saved passenger returned with name masked to `<surname>*<...>` and 12306-side ID/mobile masks preserved verbatim. - 12306 orders: empty for this test account (no in-progress orders), correct EmptyResultError surface. - 12306 price G1 北京南 -> 上海虹桥 2026-05-22: returns 商务座 ¥2158 / 特等座 ¥1163 / 一等座 ¥1035 / 二等座 ¥626 / 无座 ¥626, sorted desc. Tests: 23 unit tests (5 new beyond the prior commit's 18) cover the mask helpers (email / mobile / Chinese name) plus the parsePriceData filter that drops the bare-numeric duplicates and sorts by descending price. * fix(12306): harden browser auth boundaries * fix(12306): tighten API drift boundaries --------- Co-authored-by: jackwener <jakevingoo@gmail.com> --------- Co-authored-by: Benjamin Liu <benjaminliu.eecs@gmail.com> Co-authored-by: jackwener <jakevingoo@gmail.com> Co-authored-by: Kary <karyhe1019@gmail.com> Co-authored-by: hanzi <96609857+hanzili@users.noreply.github.com> Co-authored-by: Jun <44310040+jun0315@users.noreply.github.com> Co-authored-by: lenovobenben <lenovobenben@gmail.com> Co-authored-by: 陈家名 <chenjiaming@kezaihui.com> Co-authored-by: ml-scout <ml-scout@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #1589 with the full 12306 (中国铁路) adapter the issue proposes. Seven commands across two strategies:
PUBLIC(no login)12306 stations <keyword>PUBLIC12306 trains <from> <to> --date YYYY-MM-DDPUBLIC12306 train <train-no> --from --to --datePUBLIC12306 price <train-no> --from --to --dateCOOKIE(login)12306 meCOOKIE12306 passengersCOOKIE12306 ordersNo CAPTCHA / slider / SMS bypass, no credential storage, no ticket sniping, no order submission, no payment - per the issue's Non-goals list.
Type of Change
Checklist
Implementation notes worth flagging
Anonymous session minting
12306 rejects bare anonymous query endpoints with
HTTP 302 -> /mormhweb/logFiles/error.html. The adapter first hits/otn/leftTicket/initto mintJSESSIONID/route/BIGipServerotncookies, then attaches them to subsequent queries. No CAPTCHA / slider path is invoked. Cookies are not persisted to disk; they live only for the lifetime of the command invocation.Endpoint name rotation (
queryO/queryZ/queryA/queryG/ ...)12306 rotates the train-query endpoint name every few weeks. When the wrong name is hit the server returns
{"c_url": "leftTicket/queryG", "c_name": "CLeftTicketUrl", "status": false}pointing to the current correct name.trainswalks a list of known names, captures the rotation hint, retries against the suggested name, and mutates the runtime list so subsequent calls in the same process skip the warm-up round trip.Login detection via
document.cookie, notpage.getCookies({url})12306 sets the auth cookie
tkand the session cookieJSESSIONIDwithPath=/otn. CDPNetwork.getCookiesfilters by URL path: even with the page navigated tohttps://kyfw.12306.cn/otn/view/index.htmland the user fully logged in,page.getCookies({ url: 'https://kyfw.12306.cn' })returns 7 cookies withouttk/JSESSIONID, because that URL has path/, not/otn. Switched the login check to readdocument.cookieviapage.evaluate, which exposes the navigated page's full visible-cookie set regardless of path. Centralized asrequire12306Logininutils.jsso all three authenticated commands share the same check (and so future maintainers know not to "fix" it back topage.getCookies).Booking-handshake
secretfield is strippedThe
|-separated train wire record carries a base64secrettoken at position 0 used to construct booking handshakes. Since this PR is read-only and the issue rules out booking,parseTrainRecordparses but does not surface that field, and a regression unit test asserts it cannot leak via the adapter contract:Sensitive-field masking on authenticated commands
me: real name (Chinese mask, e.g.张*), email (e.g.xxx@xxx.xxx), mobile (12306 already masks server-side and the adapter preserves that, e.g.xxx-xxx-xxxx), birth date (year only).passengers: passenger name (Chinese mask) + birth year (year only); 12306 already masks ID number (e.g.xxxxxxxxxxxxxxxxxxx) and mobile server-side and this adapter never decodes those.--include-sensitiveto opt back into the unmasked fields the user is entitled to see on their own account.ordersscopeReturns the
queryMyOrderNoCompleteslice (orders that have not yet been ridden / refunded / completed). The historicalqueryMyOrderApiendpoint requires extra page-state handshakes that proved fragile when probed - the server returned 302s into the error landing page under repeated anonymous calls. Left as a follow-up so this command can ship reliably for the immediate "what's still on my account" use case.pricecascadingCascades three anonymous API calls per invocation:
init->queryByTrainNo(to resolve segment station_no within the train route - the price endpoint addresses stops by station_no, not telecode) ->queryTicketPrice. 12306 returns prices keyed by one-or-two-letter seat codes (A9商务座 /M一等座 /O二等座 /WZ无座 / etc.) and additionally doubles some up as bare numeric codes (e.g."9": "21580"mirrors"A9": "¥2158.0"); the bare-numeric duplicates are filtered out so the row set is one-per-seat-class.Station resolution
resolveStationmatches Chinese name (上海虹桥), telecode (AOH), full pinyin (shanghaihongqiao), and short alias (shhq). Anything else raisesArgumentErrorwith a hint. Match order is exact-name first to prevent北京matching北京北by substring.Strict limit validation
limitarguments use a small inline validator that throwsArgumentErroron non-integer / out-of-range input rather than silently clamping. Mirrors the typed-error pattern owner used in #1397 (grok) and #1370 (coupang).Screenshots / Output
Live verified anonymously and authenticated against
kyfw.12306.cn, sleeping 15-25 seconds between hits to keep 12306's anti-abuse throttle gentle. All commands run vianode ./dist/src/main.jsfrom this branch's worktree so the localclis/12306/adapters in this PR are loaded.Anonymous commands (real API output)
$ node ./dist/src/main.js 12306 stations 上海 --limit 5 -f json [ { "name": "练塘", "code": "LTU", "pinyin": "liantang", "abbr": "lt", "city": "上海" }, { "name": "上海", "code": "SHH", "pinyin": "shanghai", "abbr": "sh", "city": "上海" }, { "name": "上海南", "code": "SNH", "pinyin": "shanghainan", "abbr": "shn", "city": "上海" }, { "name": "上海虹桥", "code": "AOH", "pinyin": "shanghaihongqiao", "abbr": "shhq", "city": "上海" }, { "name": "上海西", "code": "SXH", "pinyin": "shanghaixi", "abbr": "shx", "city": "上海" } ] $ node ./dist/src/main.js 12306 trains 北京 上海 --date 2026-05-22 --limit 1 -f json [ { "train_no": "240000G54700", "code": "G547", "from_station": "北京南", "to_station": "上海虹桥", "from_code": "VNP", "to_code": "AOH", "start_time": "06:18", "arrive_time": "12:11", "duration": "05:53", "available": true, "business_seat": "无", "first_seat": "有", "second_seat": "有", "soft_sleeper": "", "hard_sleeper": "", "hard_seat": "", "no_seat": "无" } ] $ node ./dist/src/main.js 12306 train 24000000G10L --from 北京南 --to 上海虹桥 --date 2026-05-22 -f json [ { "station_no": "01", "station_name": "北京南", "arrive_time": "", "start_time": "06:30", "stopover_time": "" }, { "station_no": "02", "station_name": "沧州西", "arrive_time": "07:18", "start_time": "07:20", "stopover_time": "2分钟" }, { "station_no": "03", "station_name": "德州东", "arrive_time": "07:45", "start_time": "07:47", "stopover_time": "2分钟" }, { "station_no": "04", "station_name": "曲阜东", "arrive_time": "08:34", "start_time": "08:36", "stopover_time": "2分钟" }, { "station_no": "05", "station_name": "南京南", "arrive_time": "10:13", "start_time": "10:15", "stopover_time": "2分钟" }, { "station_no": "06", "station_name": "苏州北", "arrive_time": "10:59", "start_time": "11:01", "stopover_time": "2分钟" }, { "station_no": "07", "station_name": "上海虹桥", "arrive_time": "11:24", "start_time": "11:24", "stopover_time": "" } ] $ node ./dist/src/main.js 12306 price 24000000G10L --from 北京南 --to 上海虹桥 --date 2026-05-22 -f json [ { "seat_code": "A9", "seat_name": "商务座", "price": "2158.0", "currency": "CNY" }, { "seat_code": "P", "seat_name": "特等座", "price": "1163.0", "currency": "CNY" }, { "seat_code": "M", "seat_name": "一等座", "price": "1035.0", "currency": "CNY" }, { "seat_code": "WZ", "seat_name": "无座", "price": "626.0", "currency": "CNY" }, { "seat_code": "O", "seat_name": "二等座", "price": "626.0", "currency": "CNY" } ]Authenticated commands (shape-only; real values redacted for this PR body)
The three authenticated commands were exercised live on a Mac mini against a logged-in 12306 account. Output shape below uses synthetic placeholder values; the real PII (12306 username, family name, partial ID prefix, partial mobile, email) is not posted here. The actual run was verified locally and the masking logic was tested against the real response (see "Sensitive-field masking" above).
$ node ./dist/src/main.js 12306 me -f json [ { "username": "<account>", "real_name": "<masked>", "email": "<x***x>@<domain>", "mobile": "<3-digits>****<4-digits>", "birth_date": "<YYYY>", "sex": "<男|女>", "country": "CHN", "user_type": "成人", "member": false, "active": <bool> } ] $ node ./dist/src/main.js 12306 passengers -f json [ { "name": "<surname>*<...>", "sex": "<男|女>", "born_year": "<YYYY>", "id_type": "居民身份证", "id_no": "<4-digits>***********<3-digits>", "mobile": "<3-digits>****<4-digits>", "passenger_type": "成人", "country": "CHN" }, ... ]12306 orderswas also exercised live; the account currently has no in-progress orders, so the command correctly surfacesEMPTY_RESULT: No in-progress 12306 orders on this accountrather than fabricating an empty array.If you want a non-redacted copy of the authenticated-command output to confirm masking behaviour, ping me on the PR and I'll DM the raw output.
Test plan
npm run build: manifest compiles, 828 entries,git diff cli-manifest.jsoncontains only new 12306 entries (no absolute filesystem paths)npm run check:typed-error-lint: passes (current=173, baseline=173, new=0, resolved=0)npm run check:silent-column-drop: passes (current=102, baseline=102, new=0, resolved=0)npx vitest run --project adapter clis/12306/utils.test.js: 23 tests passing acrossparseStationBundle: structured-record extraction + skip records without telecoderesolveStation: name / telecode / pinyin / short-alias matches + empty / unknown / unknown-but-telecode-shaped failure modesvalidateDate: format check + impossible calendar datesbuildCookieHeader: join Set-Cookie linesparseTrainRecord: full-shape extraction, secret-field non-leak regression, telecode fallback when bundle is missing, short-record nullmaskEmail(3+ char local-part + edge cases),maskMobile(preserves 12306's own mask),maskChineseName(1 / 2 / 3+ char names)parsePriceData: bare-numeric-duplicate filter, sort by descending price, unknown letter codes fall back to letter as namestations/trains/train/priceagainstkyfw.12306.cnfrom a Mac mini session with 15-25 s sleeps between hitsme/passengers/ordersafter the user logged in to 12306 in the OpenCLI bridge Chrome on Mac mini; cookies surfaced viadocument.cookiecorrectly (seerequire12306Loginnote above)--include-sensitiveopts in to the unmasked-by-12306 fields.Out of scope
queryMyOrderApi): needs extra page-state handshakes; left as a follow-up so the immediate "in-progress orders" path can ship without flakiness.