halotukozak · halotukozak · Feb 20, 2026 · Feb 20, 2026 · Feb 20, 2026 · Feb 20, 2026
diff --git a/docs/_docs/theory/cfg.md b/docs/_docs/theory/cfg.md
@@ -0,0 +1,109 @@
+Context-free grammars are the backbone of syntactic analysis. A grammar defines a language by specifying how symbols can be combined and rewritten — "context-free" means each rule applies regardless of surrounding context. If the lexer is the vocabulary of a language, the grammar is its syntax.
+
+## What is a Context-Free Grammar?
+
+A grammar consists of a set of non-terminal symbols (grammar variables that can be expanded), a set of terminal symbols (the tokens the lexer produces), a set of production rules (rewrite rules), and a start symbol. A derivation starts from the start symbol and repeatedly replaces non-terminals with production right-hand sides until only terminals remain. The language of a grammar G is the set of all terminal strings reachable from the start symbol.
+
+> **Definition — Context-Free Grammar:**
+> A CFG is a 4-tuple G = (V, Σ, R, S) where:
+> - V is a finite set of non-terminal symbols (grammar variables)
+> - Σ is a finite set of terminal symbols (tokens), V ∩ Σ = ∅
+> - R ⊆ V × (V ∪ Σ)* is a finite set of production rules
+> - S ∈ V is the start symbol
+>
+> A production rule A → α means the non-terminal A can be replaced by the symbol string α.
+> A grammar generates the language L(G) = { w ∈ Σ* | S ⇒* w } — all terminal strings
+> derivable from S in zero or more steps.
+
+## BNF Notation
+
+Production rules are written in Backus-Naur Form (BNF): `A → α` means A can be rewritten as α. The vertical bar `|` separates alternatives, so `A → α | β` is shorthand for two rules. Non-terminals are written in CamelCase; terminals are UPPERCASE (matching Alpaca's token name conventions).
+
+EBNF (Extended BNF) adds optional elements `[...]`, repetition `{...}`, and grouping `(...)`. These shorthands can always be translated into plain BNF, but are useful for compact notation. This page uses BNF throughout for clarity; Alpaca's DSL maps directly to BNF productions.
+
+## The Calculator Grammar
+
+The calculator grammar is the running example for the entire Compiler Theory Tutorial. It defines arithmetic expressions with four operators and parentheses:
+
+```
+Expr  → Expr PLUS Expr
+      | Expr MINUS Expr
+      | Expr TIMES Expr
+      | Expr DIVIDE Expr
+      | LPAREN Expr RPAREN
+      | NUMBER
+
+root  → Expr
+```
+
+Identifying the 4-tuple components:
+
+- V = {Expr, root} — two non-terminals
+- Σ = {NUMBER, PLUS, MINUS, TIMES, DIVIDE, LPAREN, RPAREN} — seven terminal symbols, produced by CalcLexer
+- R = the 7 production rules above
+- S = root — the start symbol
+
+Note: this grammar is **ambiguous** — the expression `1 + 2 * 3` can be parsed in two ways depending on which `Expr` is expanded first. We will see how Alpaca resolves ambiguities on the [Conflict Resolution](../conflict-resolution.md) page.
+
+## Derivation
+
+A *derivation* is a sequence of rewriting steps from the start symbol to a terminal string. Each step replaces the leftmost non-terminal with one of its production alternatives (leftmost derivation).
+
+Leftmost derivation for `1 + 2`:
+
+```
+root ⇒ Expr
+     ⇒ Expr PLUS Expr        (apply: Expr → Expr PLUS Expr)
+     ⇒ NUMBER PLUS Expr      (apply: Expr → NUMBER, leftmost)
+     ⇒ NUMBER PLUS NUMBER    (apply: Expr → NUMBER, leftmost)
+```
+
+The first step applies `root → Expr`; the second expands the leftmost `Expr` using the `Expr PLUS Expr` production; the third and fourth substitute the literal `NUMBER` terminal for each remaining `Expr`.
+
+## Parse Trees
+
+A parse tree captures the grammatical structure of a derivation as a tree. Each internal node is a non-terminal; each leaf is a terminal. The parse tree for `1 + 2`:
+
+```
+         root
+          |
+         Expr
+        / | \
+      Expr PLUS Expr
+       |          |
+     NUMBER     NUMBER
+     (1.0)      (2.0)
+```
+
+Note: In Alpaca, the parse tree is never exposed to user code. The `Parser` macro builds it internally during the shift-reduce parse, and immediately evaluates your semantic actions (the `=>` expressions in `rule` definitions) as each node is reduced. What `parse()` returns is the typed result — a `Double` in the calculator case — not an intermediate tree object. (See [The Compilation Pipeline](pipeline.md) for the full picture.)
-Note: In Alpaca, the parse tree is never exposed to user code. The `Parser` macro builds it internally during the shift-reduce parse, and immediately evaluates your semantic actions (the `=>` expressions in `rule` definitions) as each node is reduced. What `parse()` returns is the typed result — a `Double` in the calculator case — not an intermediate tree object. (See [The Compilation Pipeline](pipeline.md) for the full picture.)
+Note: In Alpaca, parse trees are a conceptual model only; the runtime LR parser does not construct or retain an explicit parse-tree object. During shift–reduce parsing it immediately evaluates your semantic actions (the `=>` expressions in `rule` definitions) as each production is reduced, and the `Parser` macro’s job is to analyze those rules at compile time and generate the LR parse tables. What `parse()` returns is the typed result — a `Double` in the calculator case — not an intermediate tree structure. (See [The Compilation Pipeline](pipeline.md) for the full picture.)
-Note: In Alpaca, the parse tree is never exposed to user code. The `Parser` macro builds it internally during the shift-reduce parse, and immediately evaluates your semantic actions (the `=>` expressions in `rule` definitions) as each node is reduced. What `parse()` returns is the typed result — a `Double` in the calculator case — not an intermediate tree object. (See [The Compilation Pipeline](pipeline.md) for the full picture.)
+Note: In Alpaca, parse trees are a conceptual model only; the runtime LR parser does not construct or retain an explicit parse-tree object. During shift–reduce parsing it immediately evaluates your semantic actions (the `=>` expressions in `rule` definitions) as each production is reduced, and the `Parser` macro’s job is to analyze those rules at compile time and generate the LR parse tables. What `parse()` returns is the typed result — a `Double` in the calculator case — not an intermediate tree structure. (See [The Compilation Pipeline](pipeline.md) for the full picture.)
+
+## Alpaca DSL Mapping
+
+The calculator grammar maps directly to an Alpaca `Parser` definition. Each production rule becomes a case clause in a `rule(...)` call; the right-hand side pattern matches the grammatical structure, and the right-hand side expression computes the result.
+
+```scala sc:nocompile
+import alpaca.*
+
+object CalcParser extends Parser:
+  val Expr: Rule[Double] = rule(
+    { case (Expr(a), CalcLexer.PLUS(_), Expr(b))   => a + b },
+    { case (Expr(a), CalcLexer.MINUS(_), Expr(b))  => a - b },
+    { case (Expr(a), CalcLexer.TIMES(_), Expr(b))  => a * b },
+    { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b },
+    { case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },
+    { case CalcLexer.NUMBER(n) => n.value },
+  )
+  val root: Rule[Double] = rule:
+    case Expr(v) => v
+```
+
+Each `case` clause corresponds to one production rule. `Expr(a)` matches a reduced `Expr` non-terminal with value `a`. `CalcLexer.PLUS(_)` matches the PLUS terminal (the `_` discards the lexeme value since PLUS carries `Unit`). `CalcLexer.NUMBER(n)` matches a NUMBER terminal; `n.value` accesses the `Double` extracted by the lexer. The grammar's non-terminals (`Expr`, `root`) become `Rule[Double]` values; the type parameter is the result type of each reduction.
+
+> **Compile-time processing:** When you define `object CalcParser extends Parser`, the Alpaca macro reads every `rule` declaration and constructs the LR(1) parse table at compile time.
+
+## Cross-links
+
+- See [Tokens and Lexemes](tokens.md) for how the terminal symbols (NUMBER, PLUS, etc.) are produced by the lexer.
+- Next: [Why LR?](why-lr.md) — why LR parsing was chosen over top-down alternatives.
+- See [Parser](../parser.md) for the complete `rule` DSL reference and all extractor forms.
+- See [Conflict Resolution](../conflict-resolution.md) for how Alpaca resolves ambiguity in the calculator grammar.
diff --git a/docs/_docs/theory/lexer-fa.md b/docs/_docs/theory/lexer-fa.md
@@ -0,0 +1,112 @@
+# The Lexer: Regex to Finite Automata
+
+## What Does a Lexer Do?
+
+A lexer reads a character stream from left to right and emits a token stream. Each scan step
+finds the longest prefix of the remaining input that matches one of the token class patterns —
+this is the *maximal munch* rule. When no pattern matches the current position, the lexer throws
+an error. The result is a flat list of lexemes that the parser consumes next.
-A lexer reads a character stream from left to right and emits a token stream. Each scan step
-finds the longest prefix of the remaining input that matches one of the token class patterns —
-this is the *maximal munch* rule. When no pattern matches the current position, the lexer throws
-an error. The result is a flat list of lexemes that the parser consumes next.
+A lexer reads a character stream from left to right and emits a token stream. At each position,
+it tries the token class patterns in their specified order and picks the first one whose regex
+matches a prefix of the remaining input — patterns are tried in order; first match wins. When no
+pattern matches the current position, the lexer throws an error. The result is a flat list of
+lexemes that the parser consumes next.
-A lexer reads a character stream from left to right and emits a token stream. Each scan step
-finds the longest prefix of the remaining input that matches one of the token class patterns —
-this is the *maximal munch* rule. When no pattern matches the current position, the lexer throws
-an error. The result is a flat list of lexemes that the parser consumes next.
+A lexer reads a character stream from left to right and emits a token stream. At each position,
+it tries the token class patterns in their specified order and picks the first one whose regex
+matches a prefix of the remaining input — patterns are tried in order; first match wins. When no
+pattern matches the current position, the lexer throws an error. The result is a flat list of
+lexemes that the parser consumes next.
+
+## Regular Languages
+
+> **Definition — Regular language:**
+> A language L ⊆ Σ* is *regular* if it is recognized by a finite automaton (FA). Equivalently,
+> L can be described by a regular expression over alphabet Σ.
+> Each token class defines a regular language: `NUMBER` defines the set
+> { "0", "1", ..., "3.14", "100", ... }.
+
+Regex notation is a concise way to specify regular languages. This is why regex is the right
+tool for token class definitions — token classes have a "look ahead a bounded amount" structure
+that regular languages capture exactly. More complex patterns such as balanced parentheses
+require a more powerful formalism (context-free grammars, which the parser handles), but for
+token recognition, regular expressions are both necessary and sufficient.
+
+## NFA and DFA: The Conceptual Picture
+
+Any regular expression can be translated into a finite automaton that accepts the same strings.
+The standard construction proceeds in two steps.
+
+**Step 1 — NFA (nondeterministic finite automaton).** A regex is converted into an NFA via
+Thompson's construction. An NFA can have multiple possible transitions from a state on the same
+input, or transitions on the empty string. For simple patterns this is easy to visualize. The
+`PLUS` token pattern `\+` produces a two-state NFA:
+
+| State | Input `+` | Accept? |
+|-------|-----------|---------|
+| q₀ | q₁ | No |
+| q₁ | — | Yes |
+
+The machine starts at q₀, consumes a `+`, and moves to q₁ — an accepting state. Any other
+input from q₀ leads nowhere, meaning the string does not match.
+
+**Step 2 — DFA (deterministic finite automaton).** An NFA is then converted to a DFA. A DFA
+has exactly one transition per (state, input-character) pair, with no ambiguity. This matters
+for performance: a DFA can be executed in O(n) time by reading the input left to right, one
+character at a time, following the single applicable transition at each step. A DFA is therefore
+the right runtime data structure for a lexer — no backtracking, no branching.
+
+> **Definition — Deterministic Finite Automaton (DFA):**
+> A DFA is a 5-tuple (Q, Σ, δ, q₀, F) where:
+> - Q is a finite set of states
+> - Σ is the input alphabet (here: Unicode characters)
+> - δ : Q × Σ → Q is the transition function
+> - q₀ ∈ Q is the start state
+> - F ⊆ Q is the set of accepting states
+>
+> A DFA accepts a string w if δ*(q₀, w) ∈ F, where δ* is the iterated transition function.
+> In Alpaca's combined lexer DFA, each accepting state also carries a *token label* indicating
+> which token class was matched.
+
+## Combining Token Patterns into One Automaton
+
+To lex a language with multiple token classes, the standard approach builds one combined DFA. In
+theory: construct an NFA for each token pattern, connect them all to a new start state with
+epsilon transitions, then convert the combined NFA to a single DFA.
+
+Alpaca follows the same principle but implements it using Java's regex engine, which is itself
+backed by NFA/DFA machinery:
+
+- All token patterns are combined into a single Java regex alternation at compile time:
+
+```
+// Conceptual: how Alpaca combines patterns internally
+(?<NUMBER>[0-9]+(\.[0-9]+)?)|(?<PLUS>\+)|(?<MINUS>-)|(?<TIMES>\*)|...
+```
+
+- `java.util.regex.Pattern.compile(...)` is called inside the `lexerImpl` macro at compile
+  time. An invalid regex pattern therefore causes a compile error, not a runtime crash.
+- At runtime, `Tokenization.tokenize()` uses `matcher.lookingAt()` on the combined pattern at
+  the current input position. It then checks which named group matched using
+  `matcher.start(i)` to determine the token class.
+
+This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
+through the input, no backtracking.
-This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
-through the input, no backtracking.
+In practice, this means Alpaca's lexer uses a single pre-compiled combined regex and scans
+through the input from left to right, matching at each position with `lookingAt()` and using
+the named capturing groups to determine the token class; the exact performance and any
+backtracking behavior are determined by the Java regex engine and the specific token patterns.
-This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
-through the input, no backtracking.
+In practice, this means Alpaca's lexer uses a single pre-compiled combined regex and scans
+through the input from left to right, matching at each position with `lookingAt()` and using
+the named capturing groups to determine the token class; the exact performance and any
+backtracking behavior are determined by the Java regex engine and the specific token patterns.
+
+## Shadowing Detection
+
+A practical issue with ordered alternation is *shadowing*: pattern A shadows pattern B if every
+string matched by B is also matched by A (that is, L(B) ⊆ L(A), meaning every string in B's
+language is also in A's language), and A appears before B in the lexer definition. If this
+occurs, B will never match — it is dead code.
+
+Alpaca's `RegexChecker` uses the `dregex` library (a Scala/JVM library for decidable regex
+operations) to check at compile time whether any pattern's language is a subset of an earlier
+pattern's language. If shadowing is detected, the macro throws a `ShadowException` with a
+compile error pointing to the offending patterns.
+
+**Example:** If you wrote the integer pattern `"[0-9]+"` before the decimal pattern
+`"[0-9]+(\\.[0-9]+)?"`, the integer pattern would shadow the decimal one — every decimal like
+`"3.14"` is also matched by `"[0-9]+"` up to the decimal point, but more critically the integer
+pattern can match the prefix `"3"` and would consume it first. The `dregex` check catches this
+ordering mistake at compile time rather than silently producing wrong output at runtime.
+
+In `CalcLexer`, the decimal pattern `"[0-9]+(\\.[0-9]+)?"` is listed first, before any simpler
+integer-only pattern, so no shadowing occurs.
+
+> **Compile-time processing:** The `lexer` macro validates all regex patterns, combines them into a single alternation pattern, and checks for shadowing using `dregex` — all at compile time. If a regex is invalid or one pattern shadows another, you get a compile error. At runtime, the generated `Tokenization` object runs the pre-compiled combined regex against your input string.
+
+## Cross-links
+
+- See [Lexer](../lexer.md) for the complete `lexer` DSL reference.
+- See [Tokens and Lexemes](tokens.md) for what the lexer produces — the lexeme stream.
+- Next: [Context-Free Grammars](theory/cfg.md) for how token streams are parsed.
- Next: [Context-Free Grammars](theory/cfg.md) for how token streams are parsed.
+- Next: [Context-Free Grammars](cfg.md) for how token streams are parsed.
- Next: [Context-Free Grammars](theory/cfg.md) for how token streams are parsed.
+- Next: [Context-Free Grammars](cfg.md) for how token streams are parsed.
diff --git a/docs/_docs/theory/pipeline.md b/docs/_docs/theory/pipeline.md
@@ -0,0 +1,88 @@
+# The Compilation Pipeline
+
+Source text is just a string. A compiler pipeline is a sequence of transformations that turns that string into something structured and meaningful. Each stage takes the output of the previous one, narrowing the representation from raw text to a typed result.
+
+Understanding the pipeline gives you a mental model that applies to every Alpaca program you write — not just calculator expressions, but any language you define with the library.
+
+## The Four Stages
+
+Most compilers share the same four-stage structure:
+
+1. **Source text** — the raw input string, e.g., `"3 + 4 * 2"`
+2. **Lexical analysis** — groups characters into tokens: `NUMBER(3.0)`, `PLUS`, `NUMBER(4.0)`, `TIMES`, `NUMBER(2.0)`
+3. **Syntactic analysis** — arranges tokens into a parse tree (concrete syntax tree) that encodes grammatical structure
+4. **Semantic analysis / evaluation** — extracts meaning from the tree, producing a typed result (in a calculator: `Double`)
+
+Some compilers add a fifth stage — code generation — that emits machine code or bytecode. Alpaca stops at stage 4: its pipeline produces a typed Scala value, not machine code.
+
+## Alpaca's Pipeline
+
+With Alpaca, running the full pipeline takes two calls:
+
+```scala sc:nocompile
+// Full pipeline: source text → typed result
+val (_, lexemes) = CalcLexer.tokenize("3 + 4 * 2")
+// lexemes: List[Lexeme] — NUMBER(3.0), PLUS, NUMBER(4.0), TIMES, NUMBER(2.0)
+
+val (_, result) = CalcParser.parse(lexemes)
+// result: Double | Null = 11.0
+```
+
+`CalcLexer.tokenize` handles stages 1–2: it takes the source string and produces a `List[Lexeme]`. `CalcParser.parse` handles stages 3–4: it takes those lexemes, builds the parse tree internally, and returns the typed result.
-`CalcLexer.tokenize` handles stages 1–2: it takes the source string and produces a `List[Lexeme]`. `CalcParser.parse` handles stages 3–4: it takes those lexemes, builds the parse tree internally, and returns the typed result.
+`CalcLexer.tokenize` handles stages 1–2: it takes the source string and produces a `List[Lexeme]`. `CalcParser.parse` handles stages 3–4: it consumes those lexemes using the generated LR(1) parse table and your semantic actions to compute the typed result, without constructing an explicit parse tree data structure.
-`CalcLexer.tokenize` handles stages 1–2: it takes the source string and produces a `List[Lexeme]`. `CalcParser.parse` handles stages 3–4: it takes those lexemes, builds the parse tree internally, and returns the typed result.
+`CalcLexer.tokenize` handles stages 1–2: it takes the source string and produces a `List[Lexeme]`. `CalcParser.parse` handles stages 3–4: it consumes those lexemes using the generated LR(1) parse table and your semantic actions to compute the typed result, without constructing an explicit parse tree data structure.
+
+Both `CalcLexer` and `CalcParser` are objects generated by Alpaca's macros. Their definitions live in separate files (see the cross-links at the bottom of this page).
+
+## Compile-time vs Runtime Boundary
+
+Alpaca draws a sharp line between what happens at compile time and what happens at runtime. This is the most important thing to understand about the library.
+
+> **Compile-time processing:** When you write a `lexer` definition, the Scala 3 macro validates your regex patterns, checks for shadowing, and generates the `Tokenization` object. When you write a `Parser` definition, the macro reads your grammar, builds the LR(1) parse table, and detects any shift/reduce conflicts — all at compile time. At runtime, `tokenize(input)` and `parse(lexemes)` execute the pre-generated code.
+
+In concrete terms:
+
+**Compile time:**
+- The `lexer` macro validates regex patterns, detects shadowing (where one pattern makes another unreachable), and emits a `Tokenization` object
+- The `Parser` macro reads every `Rule` declaration, constructs the LR(1) parse table, and reports any shift/reduce or reduce/reduce conflicts as compile errors
+
+**Runtime:**
+- `tokenize(input)` executes the pre-generated code and returns `List[Lexeme]`
+- `parse(lexemes)` executes the pre-built parse table and returns the typed result
+
+The consequence: if your regex is invalid, or your grammar is ambiguous, you get a compile error — not a runtime crash. The pipeline is safe by construction before it ever runs on real input.
+
+Alpaca covers stages 1–3 of the classical pipeline. The "code generation" stage is not part of the library — your Scala semantic actions in the parser rules produce the final typed value directly.
+
+## Formal Definition
+
+> **Definition — Compilation pipeline:**
+> A compiler pipeline is a composition of transformations f₁ ∘ f₂ ∘ ... ∘ fₙ where each fᵢ maps the output of fᵢ₋₁ to a more structured representation.
+> Alpaca's pipeline: `parse ∘ tokenize : String → R` where R is the root non-terminal's result type.
+
+For the calculator example, `R` is `Double`. For a JSON parser, `R` might be `Any` or a custom AST type. The pipeline shape is always the same; only the result type changes.
+
+The parser internally appends a special `Lexeme.EOF` marker to the lexeme list before running the shift/reduce loop. This is an implementation detail — you do not need to add it yourself.
+
+## Mapping the Stages to Alpaca Types
+
+Each pipeline stage corresponds to a concrete Alpaca type:
+
+| Stage | Input | Output | Alpaca Type |
+|-------|-------|--------|-------------|
+| Source text | — | `String` | `String` (plain Scala) |
+| Lexical analysis | `String` | token stream | `List[Lexeme]` |
+| Syntactic analysis | `List[Lexeme]` | parse tree (internal) | LR(1) stack (internal) |
+| Semantic analysis | parse tree | typed result | `R \| Null` (your root type) |
+
+The parse tree is never exposed directly — Alpaca builds it internally and immediately evaluates your semantic actions (the `=>` expressions in `rule` definitions). What you get back from `parse` is the final typed value, not an intermediate tree.
-| Syntactic analysis | `List[Lexeme]` | parse tree (internal) | LR(1) stack (internal) |
-| Semantic analysis | parse tree | typed result | `R \| Null` (your root type) |
-
-The parse tree is never exposed directly — Alpaca builds it internally and immediately evaluates your semantic actions (the `=>` expressions in `rule` definitions). What you get back from `parse` is the final typed value, not an intermediate tree.
+| Syntactic analysis | `List[Lexeme]` | conceptual parse structure (via LR(1) stack + reductions) | LR(1) stack + reductions (internal) |
+| Semantic analysis | conceptual parse structure | typed result | `R \| Null` (your root type) |
+
+Alpaca never constructs or returns an explicit parse tree object. Instead, it uses an LR(1) stack and applies your semantic actions (the `=>` expressions in `rule` definitions) on each reduction, so what you get back from `parse` is the final typed value, not an intermediate tree.
-| Syntactic analysis | `List[Lexeme]` | parse tree (internal) | LR(1) stack (internal) |
-| Semantic analysis | parse tree | typed result | `R \| Null` (your root type) |
-
-The parse tree is never exposed directly — Alpaca builds it internally and immediately evaluates your semantic actions (the `=>` expressions in `rule` definitions). What you get back from `parse` is the final typed value, not an intermediate tree.
+| Syntactic analysis | `List[Lexeme]` | conceptual parse structure (via LR(1) stack + reductions) | LR(1) stack + reductions (internal) |
+| Semantic analysis | conceptual parse structure | typed result | `R \| Null` (your root type) |
+
+Alpaca never constructs or returns an explicit parse tree object. Instead, it uses an LR(1) stack and applies your semantic actions (the `=>` expressions in `rule` definitions) on each reduction, so what you get back from `parse` is the final typed value, not an intermediate tree.
+
+## What Comes Next
+
+The rest of the Compiler Theory Tutorial builds on this mental model:
+
+- Next: [Tokens & Lexemes](theory/tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
+- [The Lexer: Regex to Finite Automata](theory/lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them
- Next: [Tokens & Lexemes](theory/tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
- [The Lexer: Regex to Finite Automata](theory/lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them
+- Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
+- [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them
- Next: [Tokens & Lexemes](theory/tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
- [The Lexer: Regex to Finite Automata](theory/lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them
+- Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
+- [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them
+
+For the full API, see the reference pages:
+
+- See [Lexer](lexer.md) for how `CalcLexer` is defined.
+- See [Parser](parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.