Add grammar theory pages (CFG, why-LR, shift-reduce)#262
Add grammar theory pages (CFG, why-LR, shift-reduce)#262halotukozak wants to merge 3 commits intotheory-foundationfrom
Conversation
…finition - Top-down vs bottom-up parsing approaches - Left recursion infinite-loop trace showing LL failure - LR family comparison table: LR(0), SLR(1), LALR(1), LR(1) with Alpaca marked as LR(1) - Why LR(1) vs LALR(1) section grounded in Item.scala/ParseTable.scala source - LR(1) item formal definition using [A → α • β, a] dot notation with examples - O(n) parsing paragraph - Compile-time callout in established blockquote format - Cross-links to cfg.md, shift-reduce.md, ../conflict-resolution.md, ../parser.md
…rmal configuration - Parse stack explanation: (stateIndex, node) pairs from Parser.scala - Parse tables section: parse table + action table with separation of concerns - Simplified 3-production grammar block for trace clarity - 8-row parse trace table for '1 + 2' with Stack | Remaining input | Action columns - Annotation notes for steps 1, 2, 6, 7, 8 - Disclaimer that state numbers are illustrative for simplified grammar - 3 LR(1) item examples with dot notation from Item.scala - LR parse configuration formal definition in blockquote format - Connection to Alpaca runtime loop() function prose reference - O(n) loop termination paragraph - Compile-time callout in established blockquote format - Cross-links to why-lr.md, cfg.md, ../conflict-resolution.md, ../parser.md, pipeline.md
- Formal CFG 4-tuple definition (V, Σ, R, S) in blockquote format - 7-production CalcParser BNF grammar (6 Expr productions + root) - Leftmost derivation for 1 + 2 with ⇒ steps - ASCII parse tree for 1 + 2 - CalcParser Alpaca DSL block annotated with sc:nocompile - Compile-time callout in established blockquote format - Cross-links to tokens.md, why-lr.md, ../parser.md, ../conflict-resolution.md
There was a problem hiding this comment.
Pull request overview
This PR adds three comprehensive theory documentation pages explaining the fundamentals of parsing with context-free grammars, LR parsing, and the shift-reduce algorithm. These pages form part of the v1.1 Compiler Theory Tutorial (Phase 9: Grammar Theory).
Changes:
- Added formal CFG definition with calculator grammar example, derivation trace, and mapping to Alpaca DSL
- Added explanation of why LR parsing handles left-recursive grammars better than LL, with LR family comparison table and LR(1) item definition
- Added shift-reduce parsing explanation with detailed 8-step parse trace and connection to Alpaca's runtime implementation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| docs/_docs/theory/cfg.md | Introduces context-free grammars with formal 4-tuple definition, BNF notation, calculator grammar example, leftmost derivation trace, and DSL mapping |
| docs/_docs/theory/why-lr.md | Explains LL vs LR parsing, left-recursion problems, LR family comparison, and why Alpaca uses full LR(1) with source code references |
| docs/_docs/theory/shift-reduce.md | Details the shift-reduce loop with parse stack structure, 8-step trace table, LR(1) item lookahead mechanics, and runtime connection |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| |-----------|-------------------|-------------|-------| | ||
| | LR(0) | None (reduce always) | Smallest | Too weak for most real grammars | | ||
| | SLR(1) | FOLLOW sets (global per non-terminal) | Same as LR(0) | Better, still limited | | ||
| | LALR(1) | Per-state lookahead (merged item-set cores) | Same as LR(0)/SLR | Most common in practice (yacc, Bison, ANTLR) | |
There was a problem hiding this comment.
ANTLR uses LL(*) parsing, not LALR(1). ANTLR is a top-down parser generator with dynamic lookahead, while LALR(1) is a bottom-up parsing technique. This entry should be removed from the "Most common in practice" notes for LALR(1).
| | LALR(1) | Per-state lookahead (merged item-set cores) | Same as LR(0)/SLR | Most common in practice (yacc, Bison, ANTLR) | | |
| | LALR(1) | Per-state lookahead (merged item-set cores) | Same as LR(0)/SLR | Most common in practice (yacc, Bison) | |
Summary
1 + 2with ⇒ steps, ASCII parse tree, Alpaca DSL mapping withsc:nocompileCalcParser block, ambiguity discussion → conflict-resolution.mdParseTable.scaladocstring +Item.scalaper-item lookahead), formal LR(1) item definition[A → α • β, a](stateIndex, node)pairs, 8-step parse trace for1 + 2, formal LR parse configuration definition, connection to Alpaca'sloop()inParser.scalaPart of the v1.1 Compiler Theory Tutorial milestone — Phase 9: Grammar Theory (TH-04, TH-05, TH-06).
Test plan
./mill docJarpasses (all examples compile)theory/cfg.mdcontains formal 4-tuple definition and leftmost derivationtheory/why-lr.mdsays "full LR(1)" — zero occurrences of "LALR" for Alpacatheory/shift-reduce.mdcontains 8-row parse trace table with Stack | Input | Action columns> **Compile-time processing:**callout and cross-links🤖 Generated with Claude Code