Fix MarkdownV2 escape handling and align tags with spec by zeynalnia · Pull Request #829 · gram-js/gramjs

zeynalnia · 2026-04-19T09:13:54Z

Closes #830.

Summary

The MarkdownV2Parser.parse regex chain departed from the
Telegram MarkdownV2 spec
in several places — most importantly, backslash escapes were ignored entirely,
so input like 1\.5 produced literal 1\.5 and \*not bold\* was still parsed
as bold. The italic delimiter was - (non-spec) instead of _, blockquote
syntax was unsupported, and HTML special characters in plain text confused the
downstream HTML parser.

This PR rewrites the markdown→HTML transform inside the existing
markdown → HTML → HTMLParser pipeline, splits it into two reusable exported
functions, and fills the missing tag/attribute coverage on the HTML side.

Changes

`gramjs/extensions/markdownv2.ts`

Rewritten as a 6-stage pipeline:

Extract protected regions (pre / code / link / custom-emoji) into
\u0000{n}\u0000 placeholders. Inside these, only the spec's local escape
rules apply (\\ and \` for pre/code; \\ and \) for link URL).
Mask remaining \X escapes with \u0001{n}\u0001 placeholders so the
markup regexes in stage 3 can't consume escaped delimiters.
HTML-escape & and < in user content (not >, since blockquote
detection still needs it; not ", harmless in text).
Run span-markup regexes — underline __ resolved before italic _ per
spec greediness; switched italic to spec-mandated _.
Line-level blockquote pass; final line ending in || marks the quote as
expandable (<blockquote expandable> → MessageEntityBlockquote.collapsed = true).
Unmask escapes (re-escaping < and & as entities so HTML stays valid),
then restore protected regions.
Hand off to HTMLParser.parse.

Two new public exports replace the all-in-one class:

markdownV2ToHtml(message: string): string — markdown → Telegram HTML.
htmlToMarkdownV2(html: string): string — inverse, accepting every
spec-listed tag form (<strong> / <em> / <ins> / <strike> /
<del> / <tg-spoiler> / <span class="tg-spoiler">) and decoding the
four named entities (& < > ").

MarkdownV2Parser.parse / .unparse are now thin wrappers over the two.

Also: input is sanitized for raw \u0000/\u0001 (used internally as
placeholder delimiters); the pre-language identifier regex was widened to
accept c++, c#, etc.; the custom-emoji URL match accepts extra query
parameters.

`gramjs/extensions/html.ts`

onopentag now recognizes the spec-alternative tags missing from
HTMLParser.parse: <tg-spoiler>, <span class="tg-spoiler">, <ins>
(underline), <strike> (strikethrough).
unparse now emits <tg-spoiler> instead of the library-internal
<spoiler> so its output is valid Telegram HTML, and emits
<blockquote expandable> when collapsed === true so the flag survives
round-trips.

Tests

__tests__/extensions/MarkdownV2.spec.ts rewritten — 103 new specs across
14 describe blocks, plus the original Markdown and HTML suites still pass:

Span basics + nesting + multi-span + spanning newlines
Inline code + escapes + literal markup chars + HTML chars
Pre + language detection + multi-line + escapes + non-identifier first line
Inline link + URL escapes + label markup + mention + malformed
Custom emoji + non-emoji bang-link fallback + label markup
Blockquote single / multi / mid-line literal > / escaped > / span markup
inside / two separate groups
Expandable blockquote single / multi / interaction with spoiler
Backslash escapes for every spec-listed special char + double backslash +
trailing lone \ + non-special chars + protected-region escape semantics +
literal < / & round-trip
HTML chars in plain text rendered as text (not interpreted as HTML)
Edge cases: empty input, lone delimiters, malformed code/pre/link, literal
control chars in input
Direct tests of markdownV2ToHtml and htmlToMarkdownV2
Round-trip tests: parse → unparse → parse preserves entity types and
offsets for every supported entity, including the collapsed flag on
expandable blockquotes

Known limitation

htmlToMarkdownV2 does not yet emit MarkdownV2 backslash-escapes for special
characters in surrounding plain text (only inside protected regions). So
round-tripping HTML whose plain-text portions contain literal *_~|... is
not guaranteed. This is documented in the function's JSDoc and is a logical
follow-up.

Test plan

npx jest — all 136 tests pass (4 pre-existing skips), no regressions
in the HTML / Markdown / crypto suites.
Manual verification against a live Telegram chat with bold + italic +
backslash-escaped chars + blockquote + expandable blockquote + custom
emoji.

Fix MarkdownV2 escape handling and add spec-compliant tag support.

26f30ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MarkdownV2 escape handling and align tags with spec#829

Fix MarkdownV2 escape handling and align tags with spec#829
zeynalnia wants to merge 1 commit intogram-js:masterfrom
zeynalnia:fix/markdownv2-escape-handling

zeynalnia commented Apr 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zeynalnia commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

gramjs/extensions/markdownv2.ts

gramjs/extensions/html.ts

Tests

Known limitation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zeynalnia commented Apr 19, 2026 •

edited

Loading

`gramjs/extensions/markdownv2.ts`

`gramjs/extensions/html.ts`