Skip to content

Complete v1.2: Cookbook pages and tech debt cleanup#266

Open
halotukozak wants to merge 16 commits intogrammar-theoryfrom
cookbook-sidebar-integration
Open

Complete v1.2: Cookbook pages and tech debt cleanup#266
halotukozak wants to merge 16 commits intogrammar-theoryfrom
cookbook-sidebar-integration

Conversation

@halotukozak
Copy link
Copy Markdown
Owner

Summary

  • Add 4 cookbook/how-to pages: expression evaluator, error messages, multi-pass processing, whitespace-sensitive syntax
  • Fix 5 tech debt items: type annotation errors (Int→Double), debug-settings cross-links, backtick notation consistency, Next: navigation bullets, link text style
  • Add "Cookbook" nested section to sidebar navigation
  • Build verified: ./mill docJar 63/63 SUCCESS

What's included

Cookbook pages (Phase 14)

  • docs/_docs/cookbook/expression-evaluator.md — Full CalcParser with operator precedence via before/after DSL
  • docs/_docs/cookbook/error-messages.md — ShadowException, RuntimeException, and null parse result handling
  • docs/_docs/cookbook/multi-pass.md — Chaining lexer/parser passes, filtering lexeme streams
  • docs/_docs/cookbook/whitespace-sensitive.md — LexerCtx-based INDENT/DEDENT indentation tracking

Tech debt fixes (Phase 13)

  • Fixed parser.md and extractors.md type comments (IntDouble for CalcLexer.NUMBER)
  • Added debug-settings.md cross-links from lexer.md and parser.md
  • Fixed extractors.md backtick notation inconsistency
  • Added Next: prefix bullets to conflicts.md and semantic-actions.md
  • Fixed conflicts.md link text from [cfg.md](cfg.md) to [Context-Free Grammars](cfg.md)

Sidebar integration (Phase 15)

  • Added "Cookbook" nested section to sidebar.yml with all 4 how-to pages

Test plan

  • ./mill docJar passes (63/63 SUCCESS)
  • All cookbook pages use sc:nocompile annotation (11 code blocks)
  • All cookbook pages have import alpaca.*
  • Sidebar lists all 4 cookbook pages in correct order
  • Cross-links from cookbook pages to reference docs use correct relative paths

🤖 Generated with Claude Code

halotukozak and others added 16 commits February 21, 2026 09:27
- Formal definition block for Parse Table Conflict (state/symbol pair collision)
- Shift/reduce conflict section with CalcParser 1+2+3 example and real Alpaca error message
- alwaysBefore/alwaysAfter discrepancy note immediately after error block
- Reduce/reduce conflict section with Integer/Float example and error message
- LR(1) lookahead disambiguation section
- Resolution by priority section with minimal sc:nocompile example (production.plus only)
- Compile-time detection section with standard blockquote callout
- Cross-links to cfg.md, shift-reduce.md, ../conflict-resolution.md, semantic-actions.md, full-example.md
- Six-step narrative from bare grammar to working calculator (7.0)
- CalcLexer definition, bare CalcParser with ShiftReduceConflict error
- Resolved CalcParser with all 6 resolutions using production.div (not production.divide)
- Pipeline evaluation: 1+2*3=7.0, (1+2)*3=9.0 with null-check note
- Semantic action trace for 1+2*3 showing 2*3 reduces before 1+...
- Formal definition block, compile-time callout blockquote
- Theory-to-code mapping table with cross-links to all theory pages
- Formal definition block for Semantic Action (S-attributed scheme)
- Syntax-directed translation section with synthesized attribute explanation
- Extractor pattern section with complete 7-production CalcParser action table
- No-parse-tree section grounded in Parser.scala loop() implementation
- Typed results section explaining Rule[Double] compile-time type checking
- Compile-time processing callout
- Cross-links to shift-reduce.md, conflicts.md, ../extractors.md, ../parser.md, full-example.md
- No Rule[Int], no n.value.toDouble, no inherited attribute, no L-attributed
- Add 'Compiler Theory' subsection with 9 theory pages in pipeline order
- Pages use theory/pagename.md format resolving to docs/_docs/theory/
- Order: pipeline, tokens, lexer-fa, cfg, why-lr, shift-reduce, conflicts, semantic-actions, full-example
- pipeline.md: tokens.md and lexer-fa.md sibling links no longer use theory/ prefix
- pipeline.md: lexer.md and parser.md reference doc links now use ../ prefix
- tokens.md: cfg.md sibling link no longer uses theory/ prefix
- lexer-fa.md: cfg.md sibling link no longer uses theory/ prefix
…duce section

- Inserted identical correction blockquote after the reduce/reduce compiler output block
- Readers who encounter only the RR error message now learn that alwaysBefore/alwaysAfter do not exist in Alpaca API
- Correct methods are before/after per conflict-resolution.md
…y pages

- semantic-actions.md: replace backtick code span with functional [Parser](../parser.md) hyperlink
- shift-reduce.md: add Next: [Conflicts and Disambiguation](conflicts.md) bullet to Cross-links
- tokens.md: add Next: [The Lexer: Regex to Finite Automata](lexer-fa.md) bullet to Cross-links
- Line 102: n.value: Int -> n.value: Double (CalcLexer.NUMBER yields Double)
- Line 117: where an Int -> where a Double (matching type)
- Line 245: Rule[Int] -> Rule[Double] in conflict-resolution example
- Append See [Debug Settings](debug-settings.html) paragraph at end of lexer.md
- Append See [Debug Settings](debug-settings.html) paragraph at end of parser.md
- Fix [cfg.md](cfg.md) to [Context-Free Grammars](cfg.md) on line 24 (TD-05)
- Add Next: prefix to Semantic Actions bullet in conflicts.md Cross-links (TD-04)
- Add Next: prefix to Full Example bullet in semantic-actions.md Cross-links (TD-04)
- Line 22: n.value: Int -> n.value: Double (CalcLexer.NUMBER yields Double)
- Line 33: where an Int -> where a Double (matching type)
- Line 62: Rule[Int] -> Rule[Double] (CalcLexer.NUMBER binding)
- Line 67: v: Int -> v: Double (matching type annotation in comment)
- Lines 28-29: single-backslash backtick names (\+, \(, \)) -> double-backslash
  (\+, \(, \)) to match parser.md and lexer.md Naming Table style
- Explains tokenize -> filter List[Lexeme] -> parse composition pattern
- Comment-stripping example with Stage1 lexer and SumParser
- Re-lexing values example with flatMap expansion pattern
- Documents that Lexeme constructor is private[alpaca]
- Cross-links to between-stages.html, lexer.html, parser.html
- Complete CalcLexer + CalcParser with operator precedence via before/after DSL
- Rule[Double] type with n.value extractor pattern (decision [13-01])
- Full resolutions set covering +, -, *, / with correct precedence hierarchy
- Key points section warns against alwaysBefore/alwaysAfter (decision [10-01])
- Cross-links to conflict-resolution.html, parser.html, lexer.html
- IndentCtx case class with currentIndent and prevIndent fields
- IndentLexer with \n( *) pattern and body-condition workaround for guards
- INDENT/DEDENT token emission based on indentation level change
- IndentParser example reading INDENT/DEDENT tokens
- Cross-links to lexer-context.html, lexer-error-recovery.html, lexer.html
- Three sections: ShadowException (compile-time), RuntimeException (runtime lex), T | Null (parser failure)
- Clarifies ShadowException is compile-time only -- cannot be caught with try/catch
- Guards-not-supported workaround pattern included
- Notes GH #21 (no custom error handler) and GH #51/#65 (no structured parser errors)
- Cross-links to lexer-error-recovery.html, lexer-context.html, parser.html
Adds a "Cookbook" nested section to sidebar.yml listing all 4 how-to
pages: expression-evaluator, error-messages, multi-pass, whitespace-sensitive.
Build verified: ./mill docJar 63/63 SUCCESS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 22, 2026 09:39
@github-actions github-actions bot added build documentation Improvements or additions to documentation refactoring labels Feb 22, 2026
@github-actions
Copy link
Copy Markdown

File Coverage
All files 17%
alpaca/lexer.scala 0%
alpaca/lexer.scala 0%
alpaca/parser.scala 0%
alpaca/lexer.scala 93%
alpaca/internal/Showable.scala 12%
alpaca/internal/DebugSettings.scala 11%
alpaca/internal/NEL.scala 40%
alpaca/internal/internal.scala 0%
alpaca/internal/logger.scala 0%
alpaca/internal/internal.scala 0%
alpaca/internal/Csv.scala 0%
alpaca/internal/internal.scala 0%
alpaca/internal/Empty.scala 0%
alpaca/internal/debugUtils.scala 0%
alpaca/internal/internal.scala 0%
alpaca/internal/internal.scala 0%
alpaca/internal/logger.scala 0%
alpaca/internal/DebugPosition.scala 0%
alpaca/internal/DebugSettings.scala 0%
alpaca/internal/internal.scala 0%
alpaca/internal/internal.scala 20%
alpaca/internal/Showable.scala 70%
alpaca/internal/logger.scala 0%
alpaca/internal/errors.scala 0%
alpaca/internal/logger.scala 17%
alpaca/internal/AlpacaException.scala 0%
alpaca/internal/Showable.scala 18%
alpaca/internal/NEL.scala 0%
alpaca/internal/logger.scala 0%
alpaca/internal/quotes.scala 0%
alpaca/internal/logger.scala 0%
alpaca/internal/logger.scala 0%
alpaca/internal/ValidName.scala 0%
alpaca/internal/logger.scala 0%
alpaca/internal/Default.scala 0%
alpaca/internal/lexer/LazyReader.scala 95%
alpaca/internal/lexer/Lexer.scala 0%
alpaca/internal/lexer/Tokenization.scala 91%
alpaca/internal/lexer/Token.scala 0%
alpaca/internal/lexer/LineTracking.scala 50%
alpaca/internal/lexer/PositionTracking.scala 58%
alpaca/internal/lexer/BetweenStages.scala 0%
alpaca/internal/lexer/Lexeme.scala 83%
alpaca/internal/lexer/CompileNameAndPattern.scala 0%
alpaca/internal/lexer/Token.scala 54%
alpaca/internal/parser/Symbol.scala 0%
alpaca/internal/parser/Item.scala 0%
alpaca/internal/parser/ConflictException.scala 0%
alpaca/internal/parser/createTables.scala 0%
alpaca/internal/parser/ConflictException.scala 0%
alpaca/internal/parser/ParseAction.scala 0%
alpaca/internal/parser/State.scala 0%
alpaca/internal/parser/ConflictException.scala 0%
alpaca/internal/parser/Parser.scala 75%
alpaca/internal/parser/ParseTable.scala 0%
alpaca/internal/parser/ParserExtractors.scala 0%
alpaca/internal/parser/ParseAction.scala 62%
alpaca/internal/parser/ParseTable.scala 4%
alpaca/internal/parser/ParserExtractors.scala 0%
alpaca/internal/parser/Symbol.scala 0%
alpaca/internal/parser/Production.scala 0%
alpaca/internal/parser/Symbol.scala 0%
alpaca/internal/parser/ConflictResolution.scala 0%
alpaca/internal/parser/FirstSet.scala 77%
alpaca/internal/parser/Symbol.scala 0%
alpaca/internal/parser/Symbol.scala 3%
alpaca/internal/parser/ParserExtractors.scala 22%
alpaca/internal/parser/ConflictResolution.scala 0%
alpaca/internal/parser/Production.scala 0%
alpaca/internal/parser/ParseAction.scala 0%
alpaca/internal/parser/ConflictResolution.scala 0%

Minimum allowed coverage is 0%

Generated by 🐒 cobertura-action against c706729

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “Cookbook” section and expands the documentation set with both cookbook/how-to guides and a multi-page compiler-theory tutorial, alongside several consistency/tech-debt cleanups across existing reference docs and navigation.

Changes:

  • Added 4 cookbook pages (expression evaluator, error messages, multi-pass processing, whitespace-sensitive lexing) and integrated them into the sidebar.
  • Added/expanded compiler theory tutorial pages (pipeline, tokens/lexemes, lexer FA, CFGs, LR motivation, shift-reduce, conflicts, semantic actions, full example).
  • Refined reference docs with cross-links and additional guidance (e.g., debug timeout section, expanded docs link list).

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
docs/sidebar.yml Adds nested “Cookbook” section and links theory/tutorial pages into nav.
docs/_docs/getting-started.md Adds a richer “Documentation” link list with direct page links.
docs/_docs/lexer.md Reference doc for lexer DSL, regex patterns, tokens, naming rules, context intro.
docs/_docs/lexer-context.md Documents LexerCtx contract, tracking traits, snapshots, and hooks.
docs/_docs/lexer-error-recovery.md Documents compile-time lexer errors and runtime failure behavior.
docs/_docs/between-stages.md Explains Lexeme/snapshot contract and BetweenStages behavior.
docs/_docs/parser.md Reference doc for parser DSL, terminals/non-terminals, EBNF operators, conflicts.
docs/_docs/parser-context.md Documents ParserCtx usage and context threading through reductions.
docs/_docs/extractors.md Reference doc for terminal/non-terminal/EBNF extractors and lexeme fields.
docs/_docs/conflict-resolution.md Reference/tutorial for conflict messages and before/after resolution DSL.
docs/_docs/debug-settings.md Adds compile-time debug settings explanation and timeout troubleshooting.
docs/_docs/theory/pipeline.md Introduces pipeline model and compile-time vs runtime boundary.
docs/_docs/theory/tokens.md Defines tokens/lexemes and maps them to Alpaca types and examples.
docs/_docs/theory/lexer-fa.md Explains regex → FA concepts and how Alpaca combines token patterns.
docs/_docs/theory/cfg.md Introduces CFGs, derivations, parse trees, and DSL mapping.
docs/_docs/theory/why-lr.md Motivates LR parsing vs LL and explains LR family choice.
docs/_docs/theory/shift-reduce.md Step-by-step shift/reduce trace with LR(1) lookahead discussion.
docs/_docs/theory/conflicts.md Explains shift/reduce and reduce/reduce conflicts and resolution.
docs/_docs/theory/semantic-actions.md Explains semantic actions, extractor patterns, and typed results.
docs/_docs/theory/full-example.md Assembles full calculator with conflict resolution and run-through.
docs/_docs/cookbook/expression-evaluator.md How-to for precedence/associativity with before/after in a calculator.
docs/_docs/cookbook/error-messages.md How-to for understanding compile-time vs lex-time vs parse-time failures.
docs/_docs/cookbook/multi-pass.md How-to for composing multiple passes via lexeme list transformations.
docs/_docs/cookbook/whitespace-sensitive.md How-to for indentation tracking and emitting INDENT/DEDENT tokens.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// result: Double | Null = 11.0
```

`CalcLexer.tokenize` handles stages 1–2: it takes the source string and produces a `List[Lexeme]`. `CalcParser.parse` handles stages 3–4: it takes those lexemes, builds the parse tree internally, and returns the typed result.
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says CalcParser.parse “builds the parse tree internally”, but later pages explicitly state that Alpaca does not materialize a parse tree object and instead applies semantic actions immediately during reductions. Consider rephrasing to avoid implying a tree is constructed (e.g., “recognizes structure via the LR stack and evaluates semantic actions during reductions”).

Suggested change
`CalcLexer.tokenize` handles stages 1–2: it takes the source string and produces a `List[Lexeme]`. `CalcParser.parse` handles stages 3–4: it takes those lexemes, builds the parse tree internally, and returns the typed result.
`CalcLexer.tokenize` handles stages 1–2: it takes the source string and produces a `List[Lexeme]`. `CalcParser.parse` handles stages 3–4: it takes those lexemes, recognizes the grammatical structure using the LR parse table, applies your semantic actions during reductions, and returns the typed result.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +8
A lexer reads a character stream from left to right and emits a token stream. Each scan step
finds the longest prefix of the remaining input that matches one of the token class patterns —
this is the *maximal munch* rule. When no pattern matches the current position, the lexer throws
an error. The result is a flat list of lexemes that the parser consumes next.
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section describes tokenization as “find[ing] the longest prefix … that matches one of the token class patterns (maximal munch)”, but Alpaca’s lexer semantics are documented elsewhere as ordered patterns where the first matching pattern wins. As implemented (combined alternation regex + lookingAt), the match choice is priority-based by pattern order, not a global longest-match across all token patterns. Consider adjusting this wording to match the actual “ordered rules / first match wins” behavior.

Suggested change
A lexer reads a character stream from left to right and emits a token stream. Each scan step
finds the longest prefix of the remaining input that matches one of the token class patterns —
this is the *maximal munch* rule. When no pattern matches the current position, the lexer throws
an error. The result is a flat list of lexemes that the parser consumes next.
A lexer reads a character stream from left to right and emits a token stream. At each scan step,
it tries the token class patterns in a fixed order and selects the first pattern whose regex
matches a prefix of the remaining input. If no pattern matches the current position, the lexer
throws an error. The result is a flat list of lexemes that the parser consumes next.

Copilot uses AI. Check for mistakes.
"minus" { case (Expr(a), CalcLexer.MINUS(_), Expr(b)) => a - b },
"times" { case (Expr(a), CalcLexer.TIMES(_), Expr(b)) => a * b },
"div" { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b },
{ case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e },
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the parser example, the token accessors for parentheses are written as CalcLexer.( / `CalcLexer.`\) but here they appear as CalcLexer.( with only a single backslash ((/)). That token name won’t match the lexer definition above (`case "\\(" => Token["LPAREN"]`, etc.) and conflicts with the accessor form documented elsewhere (`CalcLexer.`\\(). Update the snippet to use the correct backticked accessor names for LPAREN/RPAREN (or use CalcLexer.LPAREN/CalcLexer.RPAREN consistently).

Suggested change
{ case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e },
{ case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +35
case "\\n( *)" =>
val newIndent = ctx.text.toString.count(_ == ' ')
val prev = ctx.prevIndent
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.text is documented elsewhere as the remaining input before the match, not the matched substring (see lexer-context/between-stages docs and Tokenization.tokenize implementation). Counting spaces via ctx.text.toString.count(_ == ' ') will therefore count spaces beyond the newline/indent segment and compute the wrong indent. Bind the match with @ (e.g., case m @ "\\n( *)" =>) and count spaces in m, or otherwise derive the indent from just the matched text.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +48
The `\\n( *)` pattern matches a newline followed by zero or more spaces.
`ctx.text` contains the full match text at the time the rule body runs; counting spaces in it gives the new indentation level.
`Token["INDENT"](newIndent)` and `Token["DEDENT"](newIndent)` carry the new depth as their value, which the parser can read.
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation says ctx.text contains the full match text in the lexer rule body, but LexerCtx.text is described as the remaining input before each match (with lexeme snapshots overriding text later). This mismatch is likely to confuse readers and also contradicts the code sample’s intent. Consider rewording to clarify that you should use a bound match string (via @) for the matched indentation segment, while ctx.text is the remaining input.

Copilot uses AI. Check for mistakes.
object SumParser extends Parser:
val Sum: Rule[Int] = rule(
{ case (Sum(a), Stage1.PLUS(_), Sum(b)) => a + b },
{ case Stage1.NUM(n) => n.value.asInstanceOf[Int] },
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stage1.NUM(n) binds n as a lexeme whose .value should already be Int (from Token["NUM"](num.toInt)), so n.value.asInstanceOf[Int] is misleading and unnecessary. Prefer n.value to keep the example type-safe and consistent with the rest of the docs.

Suggested change
{ case Stage1.NUM(n) => n.value.asInstanceOf[Int] },
{ case Stage1.NUM(n) => n.value },

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +50
names: mutable.Map[String, Int] = mutable.Map.empty,
) extends ParserCtx derives Copyable

object CalcParser extends Parser[CalcContext]:
val Expr: Rule[Int] = rule(
{ case CalcLexer.NUMBER(n) => n.value },
{ case CalcLexer.ID(id) => ctx.names.getOrElse(id.value, 0) },
)
val Statement: Rule[Unit | Int] = rule(
{ case (CalcLexer.ID(id), CalcLexer.ASSIGN(_), Expr(expr)) =>
ctx.names(id.value) = expr },
{ case Expr(expr) => expr },
)
val root: Rule[Unit | Int] = rule:
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stateful parser example declares val Expr: Rule[Int] but returns n.value from CalcLexer.NUMBER(n). In the docs’ running CalcLexer, NUMBER is defined as num.toDouble (i.e., n.value: Double), so this example’s types don’t line up and would not typecheck if compiled. Either change the rule/result types to Double (and update names/defaults accordingly) or introduce a separate lexer example where NUMBER is an Int.

Suggested change
names: mutable.Map[String, Int] = mutable.Map.empty,
) extends ParserCtx derives Copyable
object CalcParser extends Parser[CalcContext]:
val Expr: Rule[Int] = rule(
{ case CalcLexer.NUMBER(n) => n.value },
{ case CalcLexer.ID(id) => ctx.names.getOrElse(id.value, 0) },
)
val Statement: Rule[Unit | Int] = rule(
{ case (CalcLexer.ID(id), CalcLexer.ASSIGN(_), Expr(expr)) =>
ctx.names(id.value) = expr },
{ case Expr(expr) => expr },
)
val root: Rule[Unit | Int] = rule:
names: mutable.Map[String, Double] = mutable.Map.empty,
) extends ParserCtx derives Copyable
object CalcParser extends Parser[CalcContext]:
val Expr: Rule[Double] = rule(
{ case CalcLexer.NUMBER(n) => n.value },
{ case CalcLexer.ID(id) => ctx.names.getOrElse(id.value, 0.0) },
)
val Statement: Rule[Unit | Double] = rule(
{ case (CalcLexer.ID(id), CalcLexer.ASSIGN(_), Expr(expr)) =>
ctx.names(id.value) = expr },
{ case Expr(expr) => expr },
)
val root: Rule[Unit | Double] = rule:

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +20
object CalcParser extends Parser: // uses ParserCtx.Empty by default
val Expr: Rule[Int] = rule(
{ case (Expr(a), CalcLexer.PLUS(_), Expr(b)) => a + b },
{ case CalcLexer.NUMBER(n) => n.value },
)
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples use Rule[Int] with CalcLexer.NUMBER(n) => n.value, but elsewhere in the docs the running CalcLexer defines NUMBER as Double (num.toDouble). As written, the types are inconsistent and the snippets would not typecheck if compiled. Consider switching these examples to Double throughout or explicitly defining a separate lexer whose NUMBER token carries Int.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +83
This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
through the input, no backtracking.
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The claim that using Java’s regex engine implies “the same O(n) guarantee as a hand-built DFA” / “no backtracking” is not accurate in general: java.util.regex is a backtracking engine and can exhibit superlinear behavior for some patterns. Since Alpaca relies on Pattern + Matcher.lookingAt, it’s safer to avoid stating a strict O(n) guarantee unless the implementation enforces RE2-style constraints or similar. Consider rephrasing to a weaker guarantee (e.g., typically linear for these token patterns) or documenting the limitation.

Suggested change
This means Alpaca's lexer runs with the same O(n) guarantee as a hand-built DFA: one pass
through the input, no backtracking.
In practice, this means Alpaca's lexer behaves much like a hand-built DFA: it makes a single
pass over the input using a combined pattern and, for typical token patterns, runs in time
that is effectively linear in the input size. However, because it relies on Java's backtracking
`java.util.regex` engine, it does not provide a formal worst-case O(n) guarantee for arbitrary
regular expressions.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +60
object CalcParser extends Parser[CalcContext]:
val Expr: Rule[Int] = rule(
{ case CalcLexer.NUMBER(n) => n.value },
{ case CalcLexer.ID(id) =>
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as earlier in this file: val Expr: Rule[Int] returns n.value from CalcLexer.NUMBER, but the running CalcLexer in the docs yields Double. Keeping NUMBER consistently Double across docs would avoid confusing readers and prevent copy/paste type errors.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

📊 Test Compilation Benchmark

Branch Average Time
Base (master) 43.437s
Current (cookbook-sidebar-integration) 43.288s

Result: Current branch is 0.149s unchanged (0.34%) ℹ️

@halotukozak halotukozak changed the base branch from master to grammar-theory February 23, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build documentation Improvements or additions to documentation refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants