Tech debt cleanup by halotukozak · Pull Request #287 · halotukozak/alpaca

halotukozak · 2026-03-03T19:54:55Z

Summary

Add theory application pages and fix documentation tech debt accumulated across earlier phases.

Theory application pages

theory/conflicts.md — Parse table conflicts: formal conflict definition, shift/reduce and reduce/reduce examples with Alpaca error messages
theory/semantic-actions.md — Semantic actions: how Alpaca's production macro attaches user code to grammar rules, AST construction patterns
theory/full-example.md — Capstone narrative: complete walkthrough from grammar definition through lexing, parsing, and AST construction

Documentation infrastructure

Added "Compiler Theory" nested section to sidebar navigation
Fixed 6 broken cross-link paths in theory pages

Tech debt fixes

Fixed alwaysBefore/alwaysAfter → before/after correction note in reduce/reduce section
Added navigation bullets ("Next: ...") to theory pages for reading flow
Corrected Int → Double type comments in parser.md
Added debug-settings cross-links to lexer.md and parser.md
Fixed conflicts.md link text and backtick notation in extractors.md

🤖 Generated with Claude Code

- Formal definition block for Parse Table Conflict (state/symbol pair collision) - Shift/reduce conflict section with CalcParser 1+2+3 example and real Alpaca error message - alwaysBefore/alwaysAfter discrepancy note immediately after error block - Reduce/reduce conflict section with Integer/Float example and error message - LR(1) lookahead disambiguation section - Resolution by priority section with minimal sc:nocompile example (production.plus only) - Compile-time detection section with standard blockquote callout - Cross-links to cfg.md, shift-reduce.md, ../conflict-resolution.md, semantic-actions.md, full-example.md

- Six-step narrative from bare grammar to working calculator (7.0) - CalcLexer definition, bare CalcParser with ShiftReduceConflict error - Resolved CalcParser with all 6 resolutions using production.div (not production.divide) - Pipeline evaluation: 1+2*3=7.0, (1+2)*3=9.0 with null-check note - Semantic action trace for 1+2*3 showing 2*3 reduces before 1+... - Formal definition block, compile-time callout blockquote - Theory-to-code mapping table with cross-links to all theory pages

- Formal definition block for Semantic Action (S-attributed scheme) - Syntax-directed translation section with synthesized attribute explanation - Extractor pattern section with complete 7-production CalcParser action table - No-parse-tree section grounded in Parser.scala loop() implementation - Typed results section explaining Rule[Double] compile-time type checking - Compile-time processing callout - Cross-links to shift-reduce.md, conflicts.md, ../extractors.md, ../parser.md, full-example.md - No Rule[Int], no n.value.toDouble, no inherited attribute, no L-attributed

- Add 'Compiler Theory' subsection with 9 theory pages in pipeline order - Pages use theory/pagename.md format resolving to docs/_docs/theory/ - Order: pipeline, tokens, lexer-fa, cfg, why-lr, shift-reduce, conflicts, semantic-actions, full-example

- pipeline.md: tokens.md and lexer-fa.md sibling links no longer use theory/ prefix - pipeline.md: lexer.md and parser.md reference doc links now use ../ prefix - tokens.md: cfg.md sibling link no longer uses theory/ prefix - lexer-fa.md: cfg.md sibling link no longer uses theory/ prefix

…duce section - Inserted identical correction blockquote after the reduce/reduce compiler output block - Readers who encounter only the RR error message now learn that alwaysBefore/alwaysAfter do not exist in Alpaca API - Correct methods are before/after per conflict-resolution.md

…y pages - semantic-actions.md: replace backtick code span with functional [Parser](../parser.md) hyperlink - shift-reduce.md: add Next: [Conflicts and Disambiguation](conflicts.md) bullet to Cross-links - tokens.md: add Next: [The Lexer: Regex to Finite Automata](lexer-fa.md) bullet to Cross-links

- Line 102: n.value: Int -> n.value: Double (CalcLexer.NUMBER yields Double) - Line 117: where an Int -> where a Double (matching type) - Line 245: Rule[Int] -> Rule[Double] in conflict-resolution example

- Append See [Debug Settings](debug-settings.html) paragraph at end of lexer.md - Append See [Debug Settings](debug-settings.html) paragraph at end of parser.md

- Fix [cfg.md](cfg.md) to [Context-Free Grammars](cfg.md) on line 24 (TD-05) - Add Next: prefix to Semantic Actions bullet in conflicts.md Cross-links (TD-04) - Add Next: prefix to Full Example bullet in semantic-actions.md Cross-links (TD-04)

- Line 22: n.value: Int -> n.value: Double (CalcLexer.NUMBER yields Double) - Line 33: where an Int -> where a Double (matching type) - Line 62: Rule[Int] -> Rule[Double] (CalcLexer.NUMBER binding) - Line 67: v: Int -> v: Double (matching type annotation in comment) - Lines 28-29: single-backslash backtick names (\+, \(, \)) -> double-backslash (\+, \(, \)) to match parser.md and lexer.md Naming Table style

Copilot

Pull request overview

This PR expands the documentation set by introducing a new “Compiler Theory” tutorial section and updating existing reference docs to align examples with the calculator’s Double-based NUMBER token.

Changes:

Adds a new “Compiler Theory” section to the docs sidebar with nine new tutorial pages under docs/_docs/theory/.
Updates parser/extractor docs examples to reflect NUMBER producing Double (not Int) and adds a cross-link to Debug Settings.
Adds a Debug Settings cross-link to the lexer documentation.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
docs/sidebar.yml	Adds “Compiler Theory” section and links to new theory pages.
docs/_docs/theory/pipeline.md	Introduces the compile pipeline overview and cross-links into the theory section.
docs/_docs/theory/tokens.md	Defines tokens/lexemes and shows CalcLexer examples.
docs/_docs/theory/lexer-fa.md	Explains regex → automata concepts and how Alpaca tokenization is implemented.
docs/_docs/theory/cfg.md	Introduces context-free grammars with the calculator grammar example.
docs/_docs/theory/why-lr.md	Explains LR vs LL and motivates LR(1) for Alpaca.
docs/_docs/theory/shift-reduce.md	Walks through a shift/reduce trace and maps it to runtime behavior.
docs/_docs/theory/conflicts.md	Explains shift/reduce + reduce/reduce conflicts and resolution.
docs/_docs/theory/semantic-actions.md	Explains semantic actions and typed results in Alpaca.
docs/_docs/theory/full-example.md	Assembles the full calculator lexer+parser+resolutions example.
docs/_docs/parser.md	Updates examples/comments to `Double` and adds a Debug Settings link.
docs/_docs/lexer.md	Adds a Debug Settings link.
docs/_docs/extractors.md	Updates examples/comments to `Double` and aligns special-token accessor examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T20:05:11Z

docs/_docs/theory/semantic-actions.md

+A parser recognizes whether an input string belongs to a grammar — it accepts or rejects. But most programs need to *compute* something from the input, not just verify it. Semantic actions bridge structure and computation: they attach a computation to each production rule, so that the parser produces a typed value as a direct outcome of parsing, rather than a parse tree.
+
+## Syntax-Directed Translation


This new page doesn’t start with an H1 heading (# ...). Existing docs pages in this repo start with an H1 (e.g., docs/_docs/parser.md:1). Add a top-level heading like # Semantic Actions so the page has a visible title.

Copilot · 2026-03-03T20:05:12Z

docs/_docs/theory/conflicts.md

+A grammar is ambiguous if a string can be parsed in more than one way. In LR parsing, ambiguity manifests as a conflict: the parse table has two valid entries for the same (state, symbol) pair, and the parser cannot proceed deterministically.
+
+## What is a Parse Table Conflict?


This new page doesn’t start with an H1 heading (# ...). Existing docs pages in this repo consistently start with an H1 (e.g., docs/_docs/parser.md:1). Add a top-level heading like # Conflicts and Disambiguation to ensure the page has a proper title.

Copilot · 2026-03-03T20:05:12Z

docs/_docs/theory/cfg.md

This new page doesn’t start with an H1 heading (# ...). Most docs pages in this repo do (e.g., docs/_docs/parser.md:1). Add a top-level heading like # Context-Free Grammars so the page renders with a clear title.

Copilot · 2026-03-03T20:05:12Z

docs/_docs/theory/semantic-actions.md

+
+This is why `CalcParser.parse(lexemes)` returns a named tuple `(ctx: Ctx, result: Double | Null)` — not a tree. The semantic actions produce the final value during the parse itself.
+
+Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." (STATE.md)


This sentence references STATE.md, but there is no STATE.md file in the repository. Consider removing the reference or replacing it with a link to the relevant source/doc page (e.g., src/alpaca/internal/parser/Parser.scala or docs/_docs/parser.md) so readers can actually follow it.

Suggested change

Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." (STATE.md)

Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly."

Copilot · 2026-03-03T20:05:13Z

docs/_docs/theory/pipeline.md

+- Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
+- [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them

 For the full API, see the reference pages:

- See [Lexer](lexer.md) for how `CalcLexer` is defined.
- See [Parser](parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.
+- See [Lexer](../lexer.md) for how `CalcLexer` is defined.
+- See [Parser](../parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.


These cross-links (and other new theory pages) use .md extensions (e.g., tokens.md, ../lexer.md). The rest of the docs consistently link to the rendered .html pages (e.g., docs/_docs/parser.md links to lexer.html). If the site is published as HTML, these .md links will 404. Consider switching the links here (and across the theory section) to .html for consistency with the existing documentation.

Suggested change

- Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca

- [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them

For the full API, see the reference pages:

- See [Lexer](lexer.md) for how `CalcLexer` is defined.

- See [Parser](parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.

- See [Lexer](../lexer.md) for how `CalcLexer` is defined.

- See [Parser](../parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.

- Next: [Tokens & Lexemes](tokens.html) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca

- [The Lexer: Regex to Finite Automata](lexer-fa.html) — how regular expressions define token classes and how Alpaca compiles them

For the full API, see the reference pages:

- See [Lexer](../lexer.html) for how `CalcLexer` is defined.

- See [Parser](../parser.html) for how `CalcParser` is defined and how grammar rules produce a typed result.

Copilot · 2026-03-03T20:05:14Z

docs/_docs/theory/lexer-fa.md

 Alpaca follows the same principle but implements it using Java's regex engine, which is itself
 backed by NFA/DFA machinery:

 - All token patterns are combined into a single Java regex alternation at compile time:

 ```
 // Conceptual: how Alpaca combines patterns internally
 (?<NUMBER>[0-9]+(\.[0-9]+)?)|(?<PLUS>\+)|(?<MINUS>-)|(?<TIMES>\*)|...
 ```

 - `java.util.regex.Pattern.compile(...)` is called inside the `lexerImpl` macro at compile
  time. An invalid regex pattern therefore causes a compile error, not a runtime crash.
 - At runtime, `Tokenization.tokenize()` uses `matcher.lookingAt()` on the combined pattern at
  the current input position. It then checks which named group matched using
  `matcher.start(i)` to determine the token class.

 This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
 through the input, no backtracking.


This section claims Alpaca’s lexer has the same O(n) guarantee as a hand-built DFA and that there is “no backtracking”. However the implementation uses java.util.regex.Pattern/Matcher.lookingAt() on a combined alternation (see src/alpaca/internal/lexer/Lexer.scala and Tokenization.scala:57-70), and Java’s regex engine is generally backtracking-based and not guaranteed linear-time for all patterns. Suggest softening/rephrasing this (e.g., describe the intended single-pass behavior, and note that worst-case performance depends on the regex engine/patterns), rather than stating a hard O(n) guarantee.

Copilot · 2026-03-03T20:05:14Z

docs/_docs/theory/tokens.md

 In Alpaca, each matched token is represented as a `Lexeme[Name, Value]`. A lexeme carries four
 pieces of information:

 - `name` — the token class name string, e.g., `"NUMBER"` or `"PLUS"`
 - `value` — the extracted value with its Scala type, e.g., `3.14: Double` for NUMBER, `(): Unit`
  for PLUS
 - `position` — the character offset at the end of the match
 - `line` — the line number at the end of the match

 The tokenization output for a simple expression illustrates this:

 ```scala sc:nocompile
 val (_, lexemes) = CalcLexer.tokenize("3 + 4 * 2")
 // lexemes: List[Lexeme] =
 //   NUMBER(3.0), PLUS, NUMBER(4.0), TIMES, NUMBER(2.0)
 //
 // Each Lexeme carries:
 //   .name     — token class name (e.g., "NUMBER")
 //   .value    — extracted value  (e.g., 3.0: Double)
 //   .position — character offset at end of match
 //   .line     — line number at end of match


position is described here as a “character offset at the end of the match”, but the default lexer context tracks position as the 1-based column within the current line and resets it on \n (see src/alpaca/internal/lexer/PositionTracking.scala:8-32). Consider updating the wording (and the later .position bullet in the code comment) to avoid implying it’s a global offset across the entire input.

Copilot · 2026-03-03T20:05:14Z

docs/_docs/theory/why-lr.md

In the LR family table, ANTLR is listed as an example of an LALR(1) parser generator. ANTLR (v4) is generally an LL(*) / adaptive predictive parser rather than LALR(1), so this example is likely misleading. Consider replacing ANTLR with an actual LALR(1) tool (e.g., yacc/Bison only, or another LALR(1) generator) or rewording the examples.

Suggested change

| LALR(1) | Per-state lookahead (merged item-set cores) | Same as LR(0)/SLR | Most common in practice (yacc, Bison) |

Copilot · 2026-03-03T20:05:15Z

docs/sidebar.yml

+  - title: Compiler Theory
+    subsection:
+      - title: The Compilation Pipeline
+        page: theory/pipeline.md
+      - title: Tokens and Lexemes
+        page: theory/tokens.md
+      - title: The Lexer — Regex to Finite Automata
+        page: theory/lexer-fa.md
+      - title: Context-Free Grammars
+        page: theory/cfg.md
+      - title: Why LR Parsing?
+        page: theory/why-lr.md
+      - title: Shift-Reduce Parsing
+        page: theory/shift-reduce.md
+      - title: Conflicts and Disambiguation
+        page: theory/conflicts.md
+      - title: Semantic Actions
+        page: theory/semantic-actions.md
+      - title: Full Calculator Example
+        page: theory/full-example.md


The PR title is "Tech debt cleanup", but this change adds a substantial new "Compiler Theory" documentation section (new pages + sidebar navigation). Consider updating the PR title/description to reflect the actual scope so reviewers/readers can find it later.

Copilot · 2026-03-03T20:05:15Z

docs/_docs/theory/shift-reduce.md

 The shift-reduce loop is the heart of LR parsing. Every LR parser — regardless of whether it uses LR(0), LALR(1), or full LR(1) lookahead — executes the same fundamental loop: shift the next token onto a stack, or reduce the top of the stack to a non-terminal. This page traces that loop step by step for a concrete input.

 ## The Parse Stack


This new page doesn’t start with an H1 heading (# ...). Existing docs pages consistently begin with an H1 (e.g., docs/_docs/parser.md:1). Add a top-level heading like # Shift-Reduce Parsing to avoid rendering a page with no title.

halotukozak added 11 commits February 21, 2026 09:27

fix(13-01): correct Int -> Double type comments in parser.md

dfed268

- Line 102: n.value: Int -> n.value: Double (CalcLexer.NUMBER yields Double) - Line 117: where an Int -> where a Double (matching type) - Line 245: Rule[Int] -> Rule[Double] in conflict-resolution example

feat(13-02): add debug-settings cross-links to lexer.md and parser.md

49c67e6

- Append See [Debug Settings](debug-settings.html) paragraph at end of lexer.md - Append See [Debug Settings](debug-settings.html) paragraph at end of parser.md

halotukozak self-assigned this Mar 3, 2026

Copilot AI review requested due to automatic review settings March 3, 2026 19:54

Copilot started reviewing on behalf of halotukozak March 3, 2026 19:55 View session

halotukozak changed the base branch from master to theory-application March 3, 2026 20:04

github-actions bot added the refactoring label Mar 3, 2026

Copilot AI reviewed Mar 3, 2026

View reviewed changes

github-actions bot added documentation Improvements or additions to documentation thesis labels Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tech debt cleanup#287

Tech debt cleanup#287
halotukozak wants to merge 11 commits intotheory-applicationfrom
tech-debt-cleanup

halotukozak commented Mar 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		A parser recognizes whether an input string belongs to a grammar — it accepts or rejects. But most programs need to compute something from the input, not just verify it. Semantic actions bridge structure and computation: they attach a computation to each production rule, so that the parser produces a typed value as a direct outcome of parsing, rather than a parse tree.

		## Syntax-Directed Translation

		A grammar is ambiguous if a string can be parsed in more than one way. In LR parsing, ambiguity manifests as a conflict: the parse table has two valid entries for the same (state, symbol) pair, and the parser cannot proceed deterministically.

		## What is a Parse Table Conflict?


		This is why `CalcParser.parse(lexemes)` returns a named tuple `(ctx: Ctx, result: Double \| Null)` — not a tree. The semantic actions produce the final value during the parse itself.

		Decision confirmed: "Parse tree never exposed in Alpaca — semantic actions evaluated immediately during LR(1) reduce; parse() returns typed value directly." (STATE.md)


	\| LALR(1) \| Per-state lookahead (merged item-set cores) \| Same as LR(0)/SLR \| Most common in practice (yacc, Bison) \|

		The shift-reduce loop is the heart of LR parsing. Every LR parser — regardless of whether it uses LR(0), LALR(1), or full LR(1) lookahead — executes the same fundamental loop: shift the next token onto a stack, or reduce the top of the stack to a non-terminal. This page traces that loop step by step for a concrete input.

		## The Parse Stack

Conversation

halotukozak commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Theory application pages

Documentation infrastructure

Tech debt fixes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

halotukozak commented Mar 3, 2026 •

edited

Loading