feat!: Replace regex parser with flexmark-java for CommonMark/GFM#69
feat!: Replace regex parser with flexmark-java for CommonMark/GFM#69joewiz wants to merge 2 commits intoeXist-db:masterfrom
Conversation
14731fa to
7fa7c41
Compare
| uses: actions/setup-node@v6 | ||
| with: | ||
| node-version: ${{ matrix.node-version }} | ||
| node-version: 22 |
|
|
||
| # Run XQSuite tests | ||
| - name: Run XQSuite tests | ||
| run: npx mocha test/xqs/*.js --exit |
There was a problem hiding this comment.
0 tests are executed on ci.
There was a problem hiding this comment.
Good catch. I'll work on this.
|
|
||
| <properties> | ||
| <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | ||
| <project.build.source>8</project.build.source> |
There was a problem hiding this comment.
I d be ok with dropping J8 compact and make this J21 exist 7 only
| <project.build.source>8</project.build.source> | ||
| <project.build.target>8</project.build.target> | ||
|
|
||
| <exist.java-api.version>6.4.1</exist.java-api.version> |
There was a problem hiding this comment.
This is a ghost setting, inherited from monex let's not repeat it.
Can we have one setting for min processor version and compatibility.
There was a problem hiding this comment.
You're right, I modeled this on monex, since that came to mind first, but I will switch my modeling to semver.xq, which came to mind second. I don't know what project represents best practice. If you have a suggestion, I'll take it.
|
|
||
| <build> | ||
| <plugins> | ||
| <!-- Create uber-jar with flexmark dependencies shaded in --> |
There was a problem hiding this comment.
Personally I m not a fan of uber jars.
| /** | ||
| * Build a Parser configured from an XQuery options map. | ||
| * | ||
| * Supported options: |
There was a problem hiding this comment.
These need to be in the readme. And in Xqdoc function documentation
There was a problem hiding this comment.
The README has it in https://github.com/joewiz/exist-markdown/blob/ca1e9b0c56b71d7d84730097fd46f18b9bd25dcb/README.md#parser-options. For embedding xqdoc in Java modules, could you point me to a good example of this?
| } | ||
|
|
||
| static void buildXml(final Node node, final MemTreeBuilder builder) { | ||
| if (node instanceof Heading) { |
There was a problem hiding this comment.
I would using stronger assertions. AssertTrue makes debugging hard in case of failures, as it hides the actual output.
|
@joewiz looking good. The xqsuite execution on ci still looks odd |
|
@duncdrum No, you're right. Looking into it... |
bf61655 to
dbdd858
Compare
Replace the pure-XQuery regex-based markdown parser with a Java extension module using flexmark-java 0.64.8 for full CommonMark and GitHub Flavored Markdown compliance. BREAKING CHANGES: - md:parse() now returns md:* XML elements instead of HTML - Custom config maps replaced by XQuery typeswitch on md:parse() output - Requires eXist-db 7.0.0+ New API: - md:parse($markdown) → document-node() with md:* elements - md:parse($markdown, $options) → with parser profile/extension options - md:to-html($input) → HTML nodes (accepts string or md:* nodes) - md:to-html($input, $options) → with parser options - md:serialize($nodes) → markdown string (round-trip) Parser options map supports: - "profile": commonmark, github (default), kramdown, markdown, etc. - "extensions": tables, strikethrough, tasklist, autolink (all default) - "hard-wraps": boolean Type constants resolved via reflection (TypeCompat) to ensure binary compatibility across different eXist-db 7.0.0-SNAPSHOT builds. Fixes eXist-db#4 (XQuery code blocks mangled), eXist-db#6 (first paragraph missing), eXist-db#18 (curly braces mangled in code blocks). Partially addresses eXist-db#17 (inline HTML mark element). Follows CommonMark spec for eXist-db#19 (HTML blocks). Supersedes PR eXist-db#14. 58 XQSuite tests with %test:assertXPath assertions covering installation, parsing, HTML rendering, serialization, structural round-trips, and parser options. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add pom.xml using exist-apps-parent and kuberam-expath-plugin for EXPath/XAR packaging with Java module registration - Add xar-assembly.xml with individual flexmark dependency sets and Java module component registration - Target eXist-db 7.0.0-SNAPSHOT / Java 21 - CI workflow: Maven build, docker pull for fresh image, readiness check before XQSuite tests, run xqSuite.js via node (not mocha) - Preserve package name URI (http://exist-db.org/apps/markdown) and abbrev ("markdown") for upgrade continuity with 2.x releases - Update package.json metadata to match v3.0.0 - Use repo.exist-db.org repositories for releases and snapshots - Supersedes Dependabot PRs eXist-db#22, eXist-db#24, eXist-db#30 (Gulp no longer used) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dbdd858 to
6ddf940
Compare
Summary
md:parse()returnsmd:*XML elements (not HTML),md:to-html()renders to HTML (from string or parsed nodes),md:serialize()round-trips back to markdownmd:parse($md, map { "profile": "commonmark" })@languageattribute — critical for the forthcoming eXist-db Sandbox app's interactivexquerycode editorsexist-apps-parentandkuberam-expath-pluginBreaking changes
md:parse()now returnsmd:*XML elements instead of HTML — usemd:to-html()for HTML outputmd:parse()output with XQuery typeswitch or XSLT (see README for a complete TEI example)New API
Supported profiles:
commonmark,github(default),kramdown,markdown,pegdown,fixed-indent,multi-markdown.Migration from 2.x
markdown:parse($md)→ HTMLmd:to-html($md)→ HTMLmarkdown:parse($md, $config)→ custom outputmd:parse($md)→ XML, then transform with XQuery/XSLTFor most users, replacing
markdown:parse(...)withmd:to-html(...)is sufficient. For custom output formats (e.g., TEI), see the README for a complete typeswitch example.Issues fixed
#4 — Problems parsing XQuery code blocks
Curly braces and other XQuery syntax in fenced code blocks were mangled by the regex parser's label handler. flexmark treats code block content as opaque text:
#18 — Curly braces in fenced code blocks are mangled
Same root cause as #4. The regex parser's
{label: value}syntax handler was applied inside code blocks, replacing{$i * 2}with<span itemprop="$i * 2">. Now fixed:#6 — First paragraph missing
The regex parser dropped the first paragraph when input contained only a single block element. Now fixed:
→
3(was0with the old parser)#17 — Parsing of
markelement in inline HTMLPartially addressed. The
<mark>element is no longer dropped from output. Inmd:parse(), inline HTML is captured inmd:html-inlineelements that preserve the original tags. However,md:to-html()escapes inline HTML per XML serialization rules rather than passing it through as raw HTML — a known limitation of XML-based output.#19 — Markdown interleaved in HTML blocks is mangled
Now follows CommonMark spec behavior. As noted in the issue, CommonMark itself does not support markdown interleaved inside HTML blocks — the spec treats HTML blocks as opaque. The old parser attempted this but produced mangled output (extra
<body/>elements, ejected paragraphs). The new parser produces correct CommonMark-compliant output.#14 (PR) — Code blocks properly identified
Superseded. The regex fix is no longer relevant since the entire parser is replaced.
#22, #24, #30 (PRs) — Dependabot Gulp bumps
Superseded. Gulp is no longer used in the build system.
Test plan
joewiz/exist,nextbranch)md:parse(md:serialize(md:parse($input)))preserves structureTest installation on eXist-db 6.2.0🤖 Generated with Claude Code