Cookbook pages by halotukozak · Pull Request #286 · halotukozak/alpaca

halotukozak · 2026-03-03T19:54:50Z

Summary

Add 4 cookbook/how-to pages with practical, copy-paste-ready examples for common Alpaca use cases.

Pages added

cookbook/expression-evaluator.md (CB-01) — Build a calculator with operator precedence: lexer tokens, grammar rules, semantic actions that compute values during parsing
cookbook/error-messages.md (CB-02) — Customize error reporting: error handling strategies, position tracking, user-friendly diagnostic formatting
cookbook/multi-pass.md (CB-03) — Multi-pass processing patterns: separate lexer/parser passes, intermediate representations, context accumulation across passes
cookbook/whitespace-sensitive.md (CB-04) — Whitespace-sensitive lexing: INDENT/DEDENT token generation, indentation stack tracking, Python-style block syntax

🤖 Generated with Claude Code

- Explains tokenize -> filter List[Lexeme] -> parse composition pattern - Comment-stripping example with Stage1 lexer and SumParser - Re-lexing values example with flatMap expansion pattern - Documents that Lexeme constructor is private[alpaca] - Cross-links to between-stages.html, lexer.html, parser.html

- Complete CalcLexer + CalcParser with operator precedence via before/after DSL - Rule[Double] type with n.value extractor pattern (decision [13-01]) - Full resolutions set covering +, -, *, / with correct precedence hierarchy - Key points section warns against alwaysBefore/alwaysAfter (decision [10-01]) - Cross-links to conflict-resolution.html, parser.html, lexer.html

- IndentCtx case class with currentIndent and prevIndent fields - IndentLexer with \n( *) pattern and body-condition workaround for guards - INDENT/DEDENT token emission based on indentation level change - IndentParser example reading INDENT/DEDENT tokens - Cross-links to lexer-context.html, lexer-error-recovery.html, lexer.html

- Three sections: ShadowException (compile-time), RuntimeException (runtime lex), T | Null (parser failure) - Clarifies ShadowException is compile-time only -- cannot be caught with try/catch - Guards-not-supported workaround pattern included - Notes GH #21 (no custom error handler) and GH #51/#65 (no structured parser errors) - Cross-links to lexer-error-recovery.html, lexer-context.html, parser.html

Copilot

Pull request overview

Adds a new “Compiler Theory” documentation section plus several cookbook-style guides, expanding the Alpaca docs with conceptual explanations (LR parsing, CFGs, lexer theory) and practical patterns (indentation, multi-pass, error handling).

Changes:

Add a “Compiler Theory” docs subsection and multiple new tutorial pages under docs/_docs/theory/.
Add new cookbook pages under docs/_docs/cookbook/ (whitespace-sensitive lexing, multi-pass, expression evaluator, error messages).
Update existing reference docs to fix NUMBER value type to Double, correct backticked token escaping examples, and add links to Debug Settings.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
docs/sidebar.yml	Adds “Compiler Theory” navigation entries.
docs/_docs/theory/pipeline.md	New tutorial page describing Alpaca’s compile-time/runtime pipeline.
docs/_docs/theory/tokens.md	New tutorial page defining tokens/lexemes and Alpaca’s lexeme representation.
docs/_docs/theory/lexer-fa.md	New tutorial page explaining regex → NFA/DFA concepts and Alpaca’s lexer approach.
docs/_docs/theory/cfg.md	New tutorial page introducing CFGs via the calculator grammar.
docs/_docs/theory/why-lr.md	New tutorial page motivating LR parsing and LR(1) vs LALR(1).
docs/_docs/theory/shift-reduce.md	New tutorial page walking through the shift/reduce loop with a trace.
docs/_docs/theory/conflicts.md	New tutorial page explaining shift/reduce and reduce/reduce conflicts and resolution.
docs/_docs/theory/semantic-actions.md	New tutorial page explaining syntax-directed translation / semantic actions in Alpaca.
docs/_docs/theory/full-example.md	New end-to-end calculator example assembling lexer, parser, and resolutions.
docs/_docs/parser.md	Updates examples/types to `Double` and adds a Debug Settings cross-link.
docs/_docs/lexer.md	Adds a Debug Settings cross-link.
docs/_docs/extractors.md	Fixes examples to `Double` and corrects escaping in backticked token accessor examples.
docs/_docs/cookbook/whitespace-sensitive.md	New cookbook page showing indentation-tracking via `LexerCtx`.
docs/_docs/cookbook/multi-pass.md	New cookbook page describing multi-pass token-stream transformations.
docs/_docs/cookbook/expression-evaluator.md	New cookbook page showing operator precedence via `before`/`after`.
docs/_docs/cookbook/error-messages.md	New cookbook page explaining compile-time vs runtime errors and parse failure behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T20:02:05Z

docs/_docs/theory/cfg.md

+Context-free grammars are the backbone of syntactic analysis. A grammar defines a language by specifying how symbols can be combined and rewritten — "context-free" means each rule applies regardless of surrounding context. If the lexer is the vocabulary of a language, the grammar is its syntax.
+
+## What is a Context-Free Grammar?


This page is missing an H1 title (e.g., # Context-Free Grammars) at the top. Other pages in docs/_docs/ follow this convention and it typically drives the rendered page title.

Copilot · 2026-03-03T20:02:06Z

docs/_docs/theory/full-example.md

+The preceding theory pages have built up each component of the compiler pipeline: tokens and lexical analysis, context-free grammars, LR parsing mechanics, conflict resolution, and semantic actions. This page assembles all the pieces into a working arithmetic calculator — the same grammar used throughout the tutorial, now fully resolved and evaluating. Follow the steps below from grammar definition to the evaluated result `7.0`.
+
+## Step 1: The Lexer


This page is missing an H1 title (e.g., # Full Calculator Example) at the top. Most other docs pages start with a # ... heading, and missing it can affect page titles and navigation.

Copilot · 2026-03-03T20:02:06Z

docs/_docs/theory/lexer-fa.md

+This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
+through the input, no backtracking.


The doc claims the Java-regex-based lexer has the same O(n) / no-backtracking guarantee as a DFA. Since tokenization is implemented via java.util.regex.Pattern/Matcher.lookingAt(), worst-case runtime can still be superlinear for certain regexes due to backtracking. Consider softening this to a "typically linear" statement or documenting that performance depends on regex patterns.

Suggested change

This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass

through the input, no backtracking.

This means Alpaca's lexer is designed to run in a single left-to-right pass over the input and

typically exhibits close to linear-time behavior on well-behaved token regexes. However, since

it relies on Java's backtracking regex engine, worst-case performance still depends on the

specific patterns you write.

Copilot · 2026-03-03T20:02:06Z

docs/_docs/cookbook/expression-evaluator.md

+    "minus" { case (Expr(a), CalcLexer.MINUS(_),  Expr(b)) => a - b },
+    "times" { case (Expr(a), CalcLexer.TIMES(_),  Expr(b)) => a * b },
+    "div"   { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b },
+    { case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e },


This production uses backticked token accessors like CalcLexer.`\(``` / CalcLexer.\)``` but the lexer in this same example defines LPAREN/RPARENtokens (and, if you do intend to name tokens as"\("/"\)", the backslashes need to be doubled as shown in extractors.md). Consider changing this to CalcLexer.LPAREN/CalcLexer.RPAREN` (recommended) or updating the token naming/escaping consistently.

Suggested change

{ case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e },

{ case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },

Copilot · 2026-03-03T20:02:07Z

docs/_docs/theory/why-lr.md

+Not every parsing strategy handles every grammar. Top-down parsers are intuitive but stumble on the natural structure of arithmetic expressions. LR parsing was developed specifically to handle the grammars that arise in practice — including left-recursive grammars like the one that drives CalcParser.
+
+## Two Approaches to Parsing


This page is missing an H1 title (e.g., # Why LR Parsing?) at the top. All existing reference pages in docs/_docs/ start with a # ... heading, and missing it can affect page titles/TOC generation.

Copilot · 2026-03-03T20:02:07Z

docs/_docs/theory/conflicts.md

+A grammar is ambiguous if a string can be parsed in more than one way. In LR parsing, ambiguity manifests as a conflict: the parse table has two valid entries for the same (state, symbol) pair, and the parser cannot proceed deterministically.
+
+## What is a Parse Table Conflict?


This page is missing an H1 title (e.g., # Conflicts and Disambiguation) at the top. Existing docs pages in this repo start with an H1, and missing it can affect page titles/TOC.

Copilot · 2026-03-03T20:02:08Z

docs/_docs/theory/pipeline.md

+- Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
+- [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them
+
+For the full API, see the reference pages:
+
+- See [Lexer](../lexer.md) for how `CalcLexer` is defined.
+- See [Parser](../parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.


These cross-links use .md URLs (e.g., ../lexer.md, ../parser.md), but the rest of the docs consistently link to generated .html pages (e.g., parser.html, lexer.html). To avoid broken links in the rendered site, consider switching theory-page links to .html (including links between theory pages).

Suggested change

- Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca

- [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them

For the full API, see the reference pages:

- See [Lexer](../lexer.md) for how `CalcLexer` is defined.

- See [Parser](../parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.

- Next: [Tokens & Lexemes](tokens.html) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca

- [The Lexer: Regex to Finite Automata](lexer-fa.html) — how regular expressions define token classes and how Alpaca compiles them

For the full API, see the reference pages:

- See [Lexer](../lexer.html) for how `CalcLexer` is defined.

- See [Parser](../parser.html) for how `CalcParser` is defined and how grammar rules produce a typed result.

Copilot · 2026-03-03T20:02:08Z

docs/sidebar.yml

+      - title: Semantic Actions
+        page: theory/semantic-actions.md
+      - title: Full Calculator Example
+        page: theory/full-example.md


The PR adds new cookbook pages under docs/_docs/cookbook/, but docs/sidebar.yml doesn’t add any navigation entries for them, so they’ll be hard to discover in the docs site. Consider adding a "Cookbook" subsection with links to the new pages (whitespace-sensitive, multi-pass, expression-evaluator, error-messages).

Suggested change

page: theory/full-example.md

page: theory/full-example.md

- title: Cookbook

subsection:

- title: Whitespace-Sensitive Languages

page: cookbook/whitespace-sensitive.md

- title: Multi-Pass Parsing

page: cookbook/multi-pass.md

- title: Expression Evaluator

page: cookbook/expression-evaluator.md

- title: Better Error Messages

page: cookbook/error-messages.md

Copilot · 2026-03-03T20:02:08Z

docs/_docs/theory/shift-reduce.md

+The shift-reduce loop is the heart of LR parsing. Every LR parser — regardless of whether it uses LR(0), LALR(1), or full LR(1) lookahead — executes the same fundamental loop: shift the next token onto a stack, or reduce the top of the stack to a non-terminal. This page traces that loop step by step for a concrete input.
+
+## The Parse Stack


This page is missing an H1 title (e.g., # Shift-Reduce Parsing) at the top. Other docs pages in this repo use an H1 as the first line, which is typically used for the rendered page title.

Copilot · 2026-03-03T20:02:08Z

docs/_docs/theory/semantic-actions.md

+
+This is why `CalcParser.parse(lexemes)` returns a named tuple `(ctx: Ctx, result: Double | Null)` — not a tree. The semantic actions produce the final value during the parse itself.
+
+Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." (STATE.md)


(STATE.md) is referenced here, but there is no STATE.md in the repository, so this will be a dead reference for readers. Consider removing the reference or replacing it with a link to the actual source (e.g., src/alpaca/internal/parser/Parser.scala) or an existing docs page.

Suggested change

Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." (STATE.md)

Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly."

halotukozak added 4 commits February 21, 2026 14:04

halotukozak self-assigned this Mar 3, 2026

Copilot AI review requested due to automatic review settings March 3, 2026 19:54

Copilot started reviewing on behalf of halotukozak March 3, 2026 19:55 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

halotukozak changed the base branch from master to tech-debt-cleanup March 3, 2026 20:04

github-actions bot added the documentation Improvements or additions to documentation label Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cookbook pages#286

Cookbook pages#286
halotukozak wants to merge 4 commits intotech-debt-cleanupfrom
cookbook-pages

halotukozak commented Mar 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Context-free grammars are the backbone of syntactic analysis. A grammar defines a language by specifying how symbols can be combined and rewritten — "context-free" means each rule applies regardless of surrounding context. If the lexer is the vocabulary of a language, the grammar is its syntax.

		## What is a Context-Free Grammar?

		The preceding theory pages have built up each component of the compiler pipeline: tokens and lexical analysis, context-free grammars, LR parsing mechanics, conflict resolution, and semantic actions. This page assembles all the pieces into a working arithmetic calculator — the same grammar used throughout the tutorial, now fully resolved and evaluating. Follow the steps below from grammar definition to the evaluated result `7.0`.

		## Step 1: The Lexer

		This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
		through the input, no backtracking.

-This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
-through the input, no backtracking.
+This means Alpaca's lexer is designed to run in a single left-to-right pass over the input and
+typically exhibits close to linear-time behavior on well-behaved token regexes. However, since
+it relies on Java's backtracking regex engine, worst-case performance still depends on the
+specific patterns you write.

	{ case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e },
	{ case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },

		Not every parsing strategy handles every grammar. Top-down parsers are intuitive but stumble on the natural structure of arithmetic expressions. LR parsing was developed specifically to handle the grammars that arise in practice — including left-recursive grammars like the one that drives CalcParser.

		## Two Approaches to Parsing

		A grammar is ambiguous if a string can be parsed in more than one way. In LR parsing, ambiguity manifests as a conflict: the parse table has two valid entries for the same (state, symbol) pair, and the parser cannot proceed deterministically.

		## What is a Parse Table Conflict?

-        page: theory/full-example.md
+        page: theory/full-example.md
+  - title: Cookbook
+    subsection:
+      - title: Whitespace-Sensitive Languages
+        page: cookbook/whitespace-sensitive.md
+      - title: Multi-Pass Parsing
+        page: cookbook/multi-pass.md
+      - title: Expression Evaluator
+        page: cookbook/expression-evaluator.md
+      - title: Better Error Messages
+        page: cookbook/error-messages.md

		The shift-reduce loop is the heart of LR parsing. Every LR parser — regardless of whether it uses LR(0), LALR(1), or full LR(1) lookahead — executes the same fundamental loop: shift the next token onto a stack, or reduce the top of the stack to a non-terminal. This page traces that loop step by step for a concrete input.

		## The Parse Stack


		This is why `CalcParser.parse(lexemes)` returns a named tuple `(ctx: Ctx, result: Double \| Null)` — not a tree. The semantic actions produce the final value during the parse itself.

		Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." (STATE.md)

Conversation

halotukozak commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pages added

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

halotukozak commented Mar 3, 2026 •

edited

Loading