Conversation
- Explains tokenize -> filter List[Lexeme] -> parse composition pattern - Comment-stripping example with Stage1 lexer and SumParser - Re-lexing values example with flatMap expansion pattern - Documents that Lexeme constructor is private[alpaca] - Cross-links to between-stages.html, lexer.html, parser.html
- Complete CalcLexer + CalcParser with operator precedence via before/after DSL - Rule[Double] type with n.value extractor pattern (decision [13-01]) - Full resolutions set covering +, -, *, / with correct precedence hierarchy - Key points section warns against alwaysBefore/alwaysAfter (decision [10-01]) - Cross-links to conflict-resolution.html, parser.html, lexer.html
- IndentCtx case class with currentIndent and prevIndent fields - IndentLexer with \n( *) pattern and body-condition workaround for guards - INDENT/DEDENT token emission based on indentation level change - IndentParser example reading INDENT/DEDENT tokens - Cross-links to lexer-context.html, lexer-error-recovery.html, lexer.html
- Three sections: ShadowException (compile-time), RuntimeException (runtime lex), T | Null (parser failure) - Clarifies ShadowException is compile-time only -- cannot be caught with try/catch - Guards-not-supported workaround pattern included - Notes GH #21 (no custom error handler) and GH #51/#65 (no structured parser errors) - Cross-links to lexer-error-recovery.html, lexer-context.html, parser.html
There was a problem hiding this comment.
Pull request overview
Adds a new “Compiler Theory” documentation section plus several cookbook-style guides, expanding the Alpaca docs with conceptual explanations (LR parsing, CFGs, lexer theory) and practical patterns (indentation, multi-pass, error handling).
Changes:
- Add a “Compiler Theory” docs subsection and multiple new tutorial pages under
docs/_docs/theory/. - Add new cookbook pages under
docs/_docs/cookbook/(whitespace-sensitive lexing, multi-pass, expression evaluator, error messages). - Update existing reference docs to fix
NUMBERvalue type toDouble, correct backticked token escaping examples, and add links to Debug Settings.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/sidebar.yml | Adds “Compiler Theory” navigation entries. |
| docs/_docs/theory/pipeline.md | New tutorial page describing Alpaca’s compile-time/runtime pipeline. |
| docs/_docs/theory/tokens.md | New tutorial page defining tokens/lexemes and Alpaca’s lexeme representation. |
| docs/_docs/theory/lexer-fa.md | New tutorial page explaining regex → NFA/DFA concepts and Alpaca’s lexer approach. |
| docs/_docs/theory/cfg.md | New tutorial page introducing CFGs via the calculator grammar. |
| docs/_docs/theory/why-lr.md | New tutorial page motivating LR parsing and LR(1) vs LALR(1). |
| docs/_docs/theory/shift-reduce.md | New tutorial page walking through the shift/reduce loop with a trace. |
| docs/_docs/theory/conflicts.md | New tutorial page explaining shift/reduce and reduce/reduce conflicts and resolution. |
| docs/_docs/theory/semantic-actions.md | New tutorial page explaining syntax-directed translation / semantic actions in Alpaca. |
| docs/_docs/theory/full-example.md | New end-to-end calculator example assembling lexer, parser, and resolutions. |
| docs/_docs/parser.md | Updates examples/types to Double and adds a Debug Settings cross-link. |
| docs/_docs/lexer.md | Adds a Debug Settings cross-link. |
| docs/_docs/extractors.md | Fixes examples to Double and corrects escaping in backticked token accessor examples. |
| docs/_docs/cookbook/whitespace-sensitive.md | New cookbook page showing indentation-tracking via LexerCtx. |
| docs/_docs/cookbook/multi-pass.md | New cookbook page describing multi-pass token-stream transformations. |
| docs/_docs/cookbook/expression-evaluator.md | New cookbook page showing operator precedence via before/after. |
| docs/_docs/cookbook/error-messages.md | New cookbook page explaining compile-time vs runtime errors and parse failure behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Context-free grammars are the backbone of syntactic analysis. A grammar defines a language by specifying how symbols can be combined and rewritten — "context-free" means each rule applies regardless of surrounding context. If the lexer is the vocabulary of a language, the grammar is its syntax. | ||
|
|
||
| ## What is a Context-Free Grammar? |
There was a problem hiding this comment.
This page is missing an H1 title (e.g., # Context-Free Grammars) at the top. Other pages in docs/_docs/ follow this convention and it typically drives the rendered page title.
| The preceding theory pages have built up each component of the compiler pipeline: tokens and lexical analysis, context-free grammars, LR parsing mechanics, conflict resolution, and semantic actions. This page assembles all the pieces into a working arithmetic calculator — the same grammar used throughout the tutorial, now fully resolved and evaluating. Follow the steps below from grammar definition to the evaluated result `7.0`. | ||
|
|
||
| ## Step 1: The Lexer |
There was a problem hiding this comment.
This page is missing an H1 title (e.g., # Full Calculator Example) at the top. Most other docs pages start with a # ... heading, and missing it can affect page titles and navigation.
| This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass | ||
| through the input, no backtracking. |
There was a problem hiding this comment.
The doc claims the Java-regex-based lexer has the same O(n) / no-backtracking guarantee as a DFA. Since tokenization is implemented via java.util.regex.Pattern/Matcher.lookingAt(), worst-case runtime can still be superlinear for certain regexes due to backtracking. Consider softening this to a "typically linear" statement or documenting that performance depends on regex patterns.
| This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass | |
| through the input, no backtracking. | |
| This means Alpaca's lexer is designed to run in a single left-to-right pass over the input and | |
| typically exhibits close to linear-time behavior on well-behaved token regexes. However, since | |
| it relies on Java's backtracking regex engine, worst-case performance still depends on the | |
| specific patterns you write. |
| "minus" { case (Expr(a), CalcLexer.MINUS(_), Expr(b)) => a - b }, | ||
| "times" { case (Expr(a), CalcLexer.TIMES(_), Expr(b)) => a * b }, | ||
| "div" { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b }, | ||
| { case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e }, |
There was a problem hiding this comment.
This production uses backticked token accessors like CalcLexer.`\(``` / CalcLexer.\)``` but the lexer in this same example defines LPAREN/RPARENtokens (and, if you do intend to name tokens as"\("/"\)", the backslashes need to be doubled as shown in extractors.md). Consider changing this to CalcLexer.LPAREN/CalcLexer.RPAREN` (recommended) or updating the token naming/escaping consistently.
| { case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e }, | |
| { case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e }, |
| Not every parsing strategy handles every grammar. Top-down parsers are intuitive but stumble on the natural structure of arithmetic expressions. LR parsing was developed specifically to handle the grammars that arise in practice — including left-recursive grammars like the one that drives CalcParser. | ||
|
|
||
| ## Two Approaches to Parsing |
There was a problem hiding this comment.
This page is missing an H1 title (e.g., # Why LR Parsing?) at the top. All existing reference pages in docs/_docs/ start with a # ... heading, and missing it can affect page titles/TOC generation.
| A grammar is ambiguous if a string can be parsed in more than one way. In LR parsing, ambiguity manifests as a conflict: the parse table has two valid entries for the same (state, symbol) pair, and the parser cannot proceed deterministically. | ||
|
|
||
| ## What is a Parse Table Conflict? |
There was a problem hiding this comment.
This page is missing an H1 title (e.g., # Conflicts and Disambiguation) at the top. Existing docs pages in this repo start with an H1, and missing it can affect page titles/TOC.
| - Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca | ||
| - [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them | ||
|
|
||
| For the full API, see the reference pages: | ||
|
|
||
| - See [Lexer](../lexer.md) for how `CalcLexer` is defined. | ||
| - See [Parser](../parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result. |
There was a problem hiding this comment.
These cross-links use .md URLs (e.g., ../lexer.md, ../parser.md), but the rest of the docs consistently link to generated .html pages (e.g., parser.html, lexer.html). To avoid broken links in the rendered site, consider switching theory-page links to .html (including links between theory pages).
| - Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca | |
| - [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them | |
| For the full API, see the reference pages: | |
| - See [Lexer](../lexer.md) for how `CalcLexer` is defined. | |
| - See [Parser](../parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result. | |
| - Next: [Tokens & Lexemes](tokens.html) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca | |
| - [The Lexer: Regex to Finite Automata](lexer-fa.html) — how regular expressions define token classes and how Alpaca compiles them | |
| For the full API, see the reference pages: | |
| - See [Lexer](../lexer.html) for how `CalcLexer` is defined. | |
| - See [Parser](../parser.html) for how `CalcParser` is defined and how grammar rules produce a typed result. |
| - title: Semantic Actions | ||
| page: theory/semantic-actions.md | ||
| - title: Full Calculator Example | ||
| page: theory/full-example.md |
There was a problem hiding this comment.
The PR adds new cookbook pages under docs/_docs/cookbook/, but docs/sidebar.yml doesn’t add any navigation entries for them, so they’ll be hard to discover in the docs site. Consider adding a "Cookbook" subsection with links to the new pages (whitespace-sensitive, multi-pass, expression-evaluator, error-messages).
| page: theory/full-example.md | |
| page: theory/full-example.md | |
| - title: Cookbook | |
| subsection: | |
| - title: Whitespace-Sensitive Languages | |
| page: cookbook/whitespace-sensitive.md | |
| - title: Multi-Pass Parsing | |
| page: cookbook/multi-pass.md | |
| - title: Expression Evaluator | |
| page: cookbook/expression-evaluator.md | |
| - title: Better Error Messages | |
| page: cookbook/error-messages.md |
| The shift-reduce loop is the heart of LR parsing. Every LR parser — regardless of whether it uses LR(0), LALR(1), or full LR(1) lookahead — executes the same fundamental loop: shift the next token onto a stack, or reduce the top of the stack to a non-terminal. This page traces that loop step by step for a concrete input. | ||
|
|
||
| ## The Parse Stack |
There was a problem hiding this comment.
This page is missing an H1 title (e.g., # Shift-Reduce Parsing) at the top. Other docs pages in this repo use an H1 as the first line, which is typically used for the rendered page title.
|
|
||
| This is why `CalcParser.parse(lexemes)` returns a named tuple `(ctx: Ctx, result: Double | Null)` — not a tree. The semantic actions produce the final value during the parse itself. | ||
|
|
||
| Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." (STATE.md) |
There was a problem hiding this comment.
(STATE.md) is referenced here, but there is no STATE.md in the repository, so this will be a dead reference for readers. Consider removing the reference or replacing it with a link to the actual source (e.g., src/alpaca/internal/parser/Parser.scala) or an existing docs page.
| Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." (STATE.md) | |
| Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." |
Summary
Add 4 cookbook/how-to pages with practical, copy-paste-ready examples for common Alpaca use cases.
Pages added
🤖 Generated with Claude Code