fix(i18n): clean _protected wrong-forms that corrupt prose across all 12 dicts by heznpc · Pull Request #218 · heznpc/skillBridge

heznpc · 2026-06-16T23:05:43Z

PR 2 of the multi-agent re-review fix campaign (#217 was PR 1, the restore engine).

A 12-language re-review (225 findings) + a harmonization pass under one rule — "can this wrong-form ever appear in CORRECT target-language prose?" — cleaned every _protected block.

Why

restoreProtectedTerms rewrites each wrong-form → correct term. PR 1 anchored Latin/Cyrillic matching so a wrong-form no longer corrupts a longer word — but a wrong-form that is itself a real word, a given name, or the dict's own native translation is still destructive, and CJK forms still use literal replaceAll. This removes those data-level offenders.

What changed (50 entries dropped · 23 arrays neutralized · 12 dicts)

Real-word / given-name collisions → self-ref keep-English brand key: es/it Claudio, pt Cláudio (man's name); fr Anthropique, it/es/pt Antropico/Antrópico, de Anthropisch (real adjective); fr cotravail, es cotrabajo, pt cotrabalho; vi Mã Claude.
Generic concepts the dicts render natively → entry dropped (native rendering stands, CJK prose no longer rewritten): slash command, subagent, hook/hooks, lowercase skill/skills, native plugin/plugins, Dispatch, Enterprise — i.e. wrong-forms 技能 / 插件 / 外掛程式 / 钩子 / 挂钩 / スキル / プラグイン / フック / 후크 / субагент, each a common standalone word.
Skills (product) → self-ref keep-English key in every CJK locale, dropping real-word wrong-forms 技能 / スキル / 스킬.
Safe phonetic brand transliterations KEPT as restores (never real prose; fix GT→English brand): ja クロード, zh 克洛德/克劳德/克勞德, ru Клод, ko 클로드, de Koarbeit, es Código Claude.

All brand/product KEYS preserved for the Gemini keep-English path.

Verify

9 gates green (incl. PR 1's glossary substring build-guard) · 553 unit · e2e 20/20.

Defers to PR 3: Managed Agents canonicalization (Claude Managed Agents) + residual keep-English-coverage harmonization (e.g. ko Plugin self-ref vs ja/zh dropped).

🤖 Generated with Claude Code

… 12 dicts PR 2 of the multi-agent re-review fix campaign (PR 1 was #217, the engine). The re-review (225 findings) plus a 12-language harmonization pass under a single "can this wrong-form appear in correct prose?" rule cleaned every _protected block. restoreProtectedTerms rewrites each wrong-form -> correct term. PR 1 anchored Latin/Cyrillic matching so a wrong-form no longer corrupts a LONGER word, but a wrong-form that is itself a real word / given name / the dict's own native translation is still destructive (and CJK forms still use literal replaceAll). This PR removes those data-level offenders, keeping only safe forms: - Real-word / given-name collisions neutralized to self-referential keep-English brand keys: es/it "Claudio", pt "Cláudio" (man's name); fr "Anthropique", it/es/pt "Antropico"/"Antrópico", de "Anthropisch" (real adjective); fr "cotravail", es "cotrabajo", pt "cotrabalho" (real/coined words); vi "Mã Claude". - Generic concepts the dictionaries render natively are dropped entirely so the native rendering stands and CJK prose stops being rewritten: slash command, subagent, hook/hooks, lowercase skill/skills, native plugin/plugins, Dispatch, Enterprise — i.e. the wrong-forms 技能 / 插件 / 外掛程式 / 钩子 / 挂钩 / スキル / プラグイン / フック / 후크 / субагент / etc., every one a common standalone word. - "Skills" (the product) becomes a self-ref keep-English key in every CJK locale, dropping the real-word wrong-forms 技能 / スキル / 스킬. - SAFE phonetic brand transliterations are KEPT as restores (they never occur as real prose and fix GT's transliteration back to the English brand): ja クロード, zh 克洛德/克劳德/克勞德, ru Клод, ko 클로드, de "Koarbeit", es "Código Claude", etc. Net: 50 generic entries dropped, 23 brand wrong-form arrays neutralized, across the 12 source dictionaries (+ regenerated plugin data). All brand/product KEYS are preserved for the Gemini keep-English path. 9 gates green (incl. the PR 1 glossary build-guard) · 553 unit · e2e 20/20. Defers to PR 3: Managed Agents canonicalization ("Claude Managed Agents") and residual keep-English-coverage harmonization (e.g. ko Plugin self-ref vs others).

…s (12 dicts) (#219) PR 3 of the multi-agent re-review fix campaign (PR 1 #217 engine, PR 2 #218 data). Scope per owner decision: objective content DEFECTS + brand/product English- retention only. Stylistic word-choice (how to render "AI Fluency", subagent synonyms, generic rendering-consistency) is deferred to native review (#202). 108 value-only edits across the 12 source dictionaries, produced and then adversarially verified by an independent per-language pass (46 out-of-scope or incorrect proposals were rejected; every applied edit's prior value matched the file exactly; no keys added/removed, so key-parity holds): - Mistranslations (meaning was wrong): zh-CN Delegation 授权 "authorize" -> 委托 "delegate" and Diligence 审核 "review" -> 勤奋 "diligence" (the 4D competency names); zh-TW 審核 -> 勤勉; ja "steerable" 操舵可能 (nautical) -> 制御しやすい; ru "Headless mode" 自律 -> без интерфейса; it "Prompts" (verb-read) -> Prompt, trigger "grilletto" (gun trigger) -> attivazione, Sign In/Up swap. - Garbled strings (residue of the old unanchored protected-terms replaceAll): it "affidskill" -> affidabilità, "scalskill" -> scalabilità, "Aghook eventi" -> "Eventi degli hook". - Untranslated fragments: Bedrock/Vertex catalog connectives ("Claude with ..." -> con/avec/com), "Powered by", English lead-ins left mid-sentence (ko/ja). - Brand-policy: product names restored to English in prose — Skills / Agent Skills (the largest group: 技能/skill/Fähigkeiten/Habilidades/agent skills -> Skills/Agent Skills across agentSkills + claude101 + catalog), Claude Code (it "Codice Claude" -> Claude Code), Model Context Protocol (it). This pairs with PR 2, which removed the now-unsafe auto-restore of 技能/スキル/Skills — the curated values now carry the English brand directly. - German grammar: separable verb "Diese Lektion hervorhebt" -> "hebt ... hervor", genitive "Claude's" -> "Claudes", malformed compounds "KI-Fluencysplan" -> "KI-Fluency-Plan", "Lektion Rückblick" -> "Lektionsrückblick". 9 gates green · 553 unit · e2e 20/20 (one tight-timeout PDF-popup test flaked twice under local load; passed in isolation and on a determinism re-run — it is fixture-based and cannot be affected by dictionary content).

…llback (#224) DEFAULT_PROTECTED_TERMS is the Gemini "keep-English" fallback used by getKeepEnglishTerms() only when a locale has no _protected keys. It still listed skill, Subagent, Enterprise, Personal, Plugin, and Dispatch — generic concept words that PR #218 deliberately REMOVED from the per-locale _protected blocks because they are translated natively per locale (concept-vs-product-name policy, docs/TRANSLATION_RULES.md §1). Keeping them in the fallback told Gemini to keep ordinary words in English — the opposite of the shipped policy. Reduced to brand/product/file-format proper nouns only: API, SDK, Claude, Anthropic, Claude Code, Cowork, Computer Use, SKILL.md, frontmatter. Low-impact (the fallback only fires for a locale with an empty _protected), but it removes a policy inconsistency a future contributor/LLM pass could be misled by. Surfaced by the doc fact-check in #223. 555 unit (constants assertions unchanged) · gates green · e2e 20/20.

… write paths (#231) The GT queue applies restoreProtectedTerms() deterministically, but two other translation write paths bypassed it and relied only on the prompt's "keep English" instruction (probabilistic): the Gemini inline-HTML block translator and the code-comment translator. When the model ignored the instruction, a brand/API term (e.g. "Claude" → "클로드") was written into the lesson DOM untouched. Apply the same deterministic safety net to both paths: - gemini-block.js: restoreProtectedTerms() on the model reply before the DOM write. - code-comments.js: restore the translated comment before escaping/splicing, and build the protected-terms map up front so the standalone code-comment path has it. This is now safe to apply broadly because the engine was hardened in #217/#218 (substring guard + Latin/Cyrillic boundary + CJK interpunct guard + real-word data cleanup) — restoreProtectedTerms no longer corrupts prose, so extending it to more write paths carries no regression risk. restoreProtectedTerms is also a no-op when the map is unbuilt, and protected-terms.js loads before both modules. Tests genuinely catch the regression (verified by stash-rebuild-rerun on both): - gemini-block.test.js: a unit test feeding "클로드" asserts "<strong>Claude</strong>" is written and the restored text is cached. Fails without the fix. - code-comments e2e: the GT stub now returns "클로드 프롬프트 예시"; the existing assertion expects "# Claude 프롬프트 예시", so it only passes if restoration fires. Fails ("# 클로드 …") without the fix. lint · format · 556 unit · gemini-block 26 · e2e 21/21.

heznpc enabled auto-merge (squash) June 16, 2026 23:05

heznpc merged commit f37b326 into main Jun 16, 2026
9 checks passed

heznpc deleted the fix/protected-data-cleanup branch June 16, 2026 23:06

heznpc mentioned this pull request Jun 16, 2026

fix(i18n): fix content mistranslations, garble, and brand-policy leaks (12 dicts) #219

Merged

heznpc mentioned this pull request Jun 17, 2026

fix(i18n): drop generic concept words from DEFAULT_PROTECTED_TERMS fallback #224

Merged

heznpc mentioned this pull request Jun 17, 2026

fix(i18n): restore protected terms on the Gemini-block + code-comment write paths #231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(i18n): clean _protected wrong-forms that corrupt prose across all 12 dicts#218

fix(i18n): clean _protected wrong-forms that corrupt prose across all 12 dicts#218
heznpc merged 1 commit into
mainfrom
fix/protected-data-cleanup

heznpc commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heznpc commented Jun 16, 2026

Why

What changed (50 entries dropped · 23 arrays neutralized · 12 dicts)

Verify

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant