Skip to content

xemantic/markanywhere

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

markanywhere

Stream Markdown or Markup document formats as interchangeable hierarchical streams of events

Maven Central Version GitHub Release Date license

GitHub Actions Workflow Status GitHub branch check runs GitHub commits since latest release GitHub last commit

GitHub contributors GitHub commit activity GitHub code size in bytes GitHub Created At kotlin version discord users online Bluesky

TL;DR

# Very important expression of machine cognition

Hi, this is your LLM speaking.

<thinking>
OK, maybe I am too informal. **I will change the tone**.
</thinking>

Dear user of this system ...

So Markdown, but sometimes there is Markup inside, and it is streaming. How to tackle this.

Elaborate rationale

We use language to convey meaning, and we use text to express language. The document-whether scroll, codex, or book-established a paradigm for how text is preserved as a packaged unit. Documents also introduced formatting: visual and structural conventions that signal the intent behind particular fragments of text within a larger context.

When we built machines to process text, we formalized this into "document formats". These formats naturally inherited the hierarchical structure of books-parts, chapters, sections, paragraphs-and the software we built assumed that documents exist as complete artifacts to be parsed, transformed, and rendered.

But something new has emerged. We started texting each other, and text became a stream of information: received, comprehended, and often discarded in the moment of reception. This is also the communication paradigm between humans and LLMs. The text is not a document to be opened and read-it is an unfolding stream, with alternating modalities, comprehended while being generated.

Structured documents are not the right abstraction here. What we need instead is an ontology of expressive meaning as a stream of events: each event signaling either an incremental fragment of text or a transition between modalities of linguistic expression (from prose to code, from paragraph to heading, from plain text to emphasis). markanywhere inverts the traditional document processing flow. Rather than consuming complete documents and producing structure, it consumes streaming tokens and emits semantic events in real-time. These events can then be transformed-also as a stream-into various output formats: HTML, Markdown, XML, or whatever the receiving context requires.

The ontology of a meaningful stream of text

The SemanticEvent can be a:

  • Text: a chunk of characters
  • Mark (e.g. <em> tag, with optional attributes)
  • Unmark (e.g. </div>, indicating that previously opened mark is closed)

See the SemanticEvent definition.

Usage

In build.gradle.kts add:

dependencies {
    implementation("com.xemantic.markanywhere:markanywhere:0.1.3")
}

About

Markdown or Markup transformed into a stream of events

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Contributors 2

  •  
  •  

Languages