Documentation: conflict resolution and contextual parsing guides#293
Documentation: conflict resolution and contextual parsing guides#293halotukozak wants to merge 4 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds new end-user documentation pages for Alpaca’s Lexer and Parser APIs, plus dedicated guides covering conflict resolution and contextual parsing (including mention of the BetweenStages hook).
Changes:
- Add
LexerAPI documentation with examples and context usage - Add
ParserAPI documentation including EBNF operators and conflict resolution - Add guides for conflict resolution and contextual parsing concepts
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| docs/_docs/parser.md | New Parser API documentation, grammar DSL overview, and conflict-resolution section |
| docs/_docs/lexer.md | New Lexer API documentation, tokenization, and context examples |
| docs/_docs/guides/contextual-parsing.md | New contextual parsing guide, including lexer/parser context and BetweenStages discussion |
| docs/_docs/guides/conflict-resolution.md | New conflict resolution guide with precedence/associativity examples and debugging link |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| case class BraceCtx( | ||
| var text: CharSequence = "", | ||
| stack: mutable.Stack[String] = mutable.Stack() |
There was a problem hiding this comment.
This snippet won’t compile as written: it imports scala.collection.mutable.Stack but then uses mutable.Stack[...]/mutable.Stack() without importing scala.collection.mutable or aliasing it. Use Stack[...]/Stack() consistently, or import/qualify scala.collection.mutable.
| stack: mutable.Stack[String] = mutable.Stack() | |
| stack: Stack[String] = Stack() |
| Token["("] | ||
| case "\\)" => | ||
| if ctx.stack.isEmpty || ctx.stack.pop() != "paren" then | ||
| throw RuntimeException("Mismatched parenthesis") |
There was a problem hiding this comment.
In Scala this needs instantiation: throw RuntimeException("...") won’t compile unless there’s a custom RuntimeException object in scope. Use throw new RuntimeException("Mismatched parenthesis") (same for similar examples).
| throw RuntimeException("Mismatched parenthesis") | |
| throw new RuntimeException("Mismatched parenthesis") |
|
|
||
| ### Automatic Updates | ||
| By default, Alpaca uses `BetweenStages` to automatically update the `text` field in your context (to advance past the matched string). | ||
| If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters. |
There was a problem hiding this comment.
Grammar: “hooks also increments line and position counters” should be singular (“increment”) to match the plural subject.
| If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters. | |
| If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increment `line` and `position` counters. |
| 2. Computes the First and Follow sets. | ||
| 3. Generates the LR(1) (or LALR) transition and action tables. |
There was a problem hiding this comment.
“Internal Working” claims the parser computes Follow sets and may generate LR(1) “(or LALR)” tables. The current implementation appears LR(1)-only and uses FIRST sets (no Follow-set implementation found under src/alpaca/internal/parser). Please align this section with what the macro actually generates, or add references to the relevant code paths if these features exist elsewhere.
| 2. Computes the First and Follow sets. | |
| 3. Generates the LR(1) (or LALR) transition and action tables. | |
| 2. Computes the FIRST sets for the grammar's nonterminals. | |
| 3. Generates the LR(1) transition and action tables. |
| Most contextual logic in Alpaca happens at the lexer level. | ||
| Since the lexer tokenizes the entire input before the parser starts, the lexer context is the primary place to track state that affects tokenization. | ||
|
|
||
| ### Exaple: Brace Matching & Nesting |
There was a problem hiding this comment.
Heading typo: "Exaple" should be "Example".
| ### Exaple: Brace Matching & Nesting | |
| ### Example: Brace Matching & Nesting |
| // id is a Lexeme, which has a .line property (and others) from the LexerCtx | ||
| println(s"Matched ID at line ${id.line}") | ||
| // fields contains all members of your LexerCtx | ||
| println(s"All context fields: ${id.fields}") |
There was a problem hiding this comment.
Lexeme.fields is private[alpaca] in the implementation, so id.fields won’t be accessible to library users. Consider removing this line and instead document that lexeme context is accessed via dynamic selectors (e.g., id.line, id.position, or other case-class fields present in the lexer context).
| // id is a Lexeme, which has a .line property (and others) from the LexerCtx | |
| println(s"Matched ID at line ${id.line}") | |
| // fields contains all members of your LexerCtx | |
| println(s"All context fields: ${id.fields}") | |
| // id is a Lexeme; you can access members of your LexerCtx via dynamic selectors | |
| println(s"Matched ID at line ${id.line}") | |
| // e.g. if your LexerCtx defines `position`, you can access it as `id.position` |
| If you need complex logic to run after every match regardless of which token was matched, you can provide a custom `given` instance of `BetweenStages`. | ||
|
|
||
| ```scala | ||
|
|
||
| trait CustomTrait extends LexerCtx: | ||
| var indentLevel: Int | ||
|
|
||
| case class CustomCtx(var text: CharSequence = "", var indentLevel: Int = 0) extends CustomTrait derives BetweenStages | ||
|
|
||
| given BetweenStages[CustomTrait] = new: | ||
| def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit = | ||
| // Custom logic to update indentLevel based on the matched token | ||
| token match | ||
| case Token["INDENT"](_) => ctx.indentLevel += 1 | ||
| case Token["DEDENT"](_) => ctx.indentLevel -= 1 | ||
| case _ => () | ||
| ``` | ||
|
|
There was a problem hiding this comment.
The BetweenStages customization example appears unworkable for library users as written:
BetweenStagesis declaredprivate[alpaca](src/alpaca/internal/lexer/BetweenStages.scala), so user code can’t name it to writegiven BetweenStages[...]orderives BetweenStages.- The sample also references
MyCtx(undefined) and mixesCustomTrait/MyCtxtypes in theapplysignature.
Either adjust this section to describe the behavior without showing user-defined instances, or makeBetweenStagesa public, supported extension point and update the example to compile.
| If you need complex logic to run after every match regardless of which token was matched, you can provide a custom `given` instance of `BetweenStages`. | |
| ```scala | |
| trait CustomTrait extends LexerCtx: | |
| var indentLevel: Int | |
| case class CustomCtx(var text: CharSequence = "", var indentLevel: Int = 0) extends CustomTrait derives BetweenStages | |
| given BetweenStages[CustomTrait] = new: | |
| def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit = | |
| // Custom logic to update indentLevel based on the matched token | |
| token match | |
| case Token["INDENT"](_) => ctx.indentLevel += 1 | |
| case Token["DEDENT"](_) => ctx.indentLevel -= 1 | |
| case _ => () | |
| ``` | |
| The `BetweenStages` hook is currently an internal mechanism used by Alpaca to perform these automatic updates; it is not exposed as a stable public extension point. To implement custom behavior that should run after particular matches, encode that logic directly in your token definitions and `LexerCtx` (for example, by updating fields in your context inside token matchers or using additional stateful rules), rather than trying to provide your own `BetweenStages` instance. |
| case "\n" => | ||
| ctx.line += 1 | ||
| ctx.position = 1 | ||
| Token.Ignored | ||
| case "." => | ||
| ctx.position += 1 |
There was a problem hiding this comment.
This example manually mutates ctx.line/ctx.position while also using LexerCtx.Default, which already mixes in LineTracking and PositionTracking (and those are applied via BetweenStages). As a result, line/position updates will be double-counted.
Consider either (1) removing the manual increments and just returning Token.Ignored/Token[...], or (2) switching the example to a context that does not extend the tracking traits (e.g., LexerCtx.Empty) if you want to demonstrate manual tracking.
| case "\n" => | |
| ctx.line += 1 | |
| ctx.position = 1 | |
| Token.Ignored | |
| case "." => | |
| ctx.position += 1 | |
| // `LexerCtx.Default` automatically updates `line` and `position` | |
| case "\n" => | |
| Token.Ignored | |
| case "." => |
|
|
||
| - **Symbols**: The building blocks of your grammar. They can be: | ||
| - **Terminals**: Tokens from your lexer (e.g., `MyLexer.PLUS`). | ||
| - **Non-Terminals**: Other rules (e.g., `Expr[Double]`). |
There was a problem hiding this comment.
This non-terminal example is misleading: Expr[Double] isn’t a valid way to reference a rule in Alpaca. Non-terminals are Rule[T] values (e.g., val Expr: Rule[Double] = ...) and are referenced in productions via the extractor form Expr(e).
| - **Non-Terminals**: Other rules (e.g., `Expr[Double]`). | |
| - **Non-Terminals**: Other rules (e.g., `Expr: Rule[Double]`). |
Summary
BetweenStageshook in contextual parsing guide🤖 Generated with Claude Code