Skip to content

Documentation: conflict resolution and contextual parsing guides#293

Open
halotukozak wants to merge 4 commits intomasterfrom
conflict-resulution-docs
Open

Documentation: conflict resolution and contextual parsing guides#293
halotukozak wants to merge 4 commits intomasterfrom
conflict-resulution-docs

Conversation

@halotukozak
Copy link
Copy Markdown
Owner

Summary

  • Add comprehensive documentation for Lexer and Parser APIs
  • Add detailed guide on conflict resolution in Alpaca parsers
  • Add guide on contextual parsing in Alpaca lexers and parsers
  • Document BetweenStages hook in contextual parsing guide

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 4, 2026 14:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new end-user documentation pages for Alpaca’s Lexer and Parser APIs, plus dedicated guides covering conflict resolution and contextual parsing (including mention of the BetweenStages hook).

Changes:

  • Add Lexer API documentation with examples and context usage
  • Add Parser API documentation including EBNF operators and conflict resolution
  • Add guides for conflict resolution and contextual parsing concepts

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
docs/_docs/parser.md New Parser API documentation, grammar DSL overview, and conflict-resolution section
docs/_docs/lexer.md New Lexer API documentation, tokenization, and context examples
docs/_docs/guides/contextual-parsing.md New contextual parsing guide, including lexer/parser context and BetweenStages discussion
docs/_docs/guides/conflict-resolution.md New conflict resolution guide with precedence/associativity examples and debugging link

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


case class BraceCtx(
var text: CharSequence = "",
stack: mutable.Stack[String] = mutable.Stack()
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet won’t compile as written: it imports scala.collection.mutable.Stack but then uses mutable.Stack[...]/mutable.Stack() without importing scala.collection.mutable or aliasing it. Use Stack[...]/Stack() consistently, or import/qualify scala.collection.mutable.

Suggested change
stack: mutable.Stack[String] = mutable.Stack()
stack: Stack[String] = Stack()

Copilot uses AI. Check for mistakes.
Token["("]
case "\\)" =>
if ctx.stack.isEmpty || ctx.stack.pop() != "paren" then
throw RuntimeException("Mismatched parenthesis")
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Scala this needs instantiation: throw RuntimeException("...") won’t compile unless there’s a custom RuntimeException object in scope. Use throw new RuntimeException("Mismatched parenthesis") (same for similar examples).

Suggested change
throw RuntimeException("Mismatched parenthesis")
throw new RuntimeException("Mismatched parenthesis")

Copilot uses AI. Check for mistakes.

### Automatic Updates
By default, Alpaca uses `BetweenStages` to automatically update the `text` field in your context (to advance past the matched string).
If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters.
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: “hooks also increments line and position counters” should be singular (“increment”) to match the plural subject.

Suggested change
If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters.
If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increment `line` and `position` counters.

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +133
2. Computes the First and Follow sets.
3. Generates the LR(1) (or LALR) transition and action tables.
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Internal Working” claims the parser computes Follow sets and may generate LR(1) “(or LALR)” tables. The current implementation appears LR(1)-only and uses FIRST sets (no Follow-set implementation found under src/alpaca/internal/parser). Please align this section with what the macro actually generates, or add references to the relevant code paths if these features exist elsewhere.

Suggested change
2. Computes the First and Follow sets.
3. Generates the LR(1) (or LALR) transition and action tables.
2. Computes the FIRST sets for the grammar's nonterminals.
3. Generates the LR(1) transition and action tables.

Copilot uses AI. Check for mistakes.
Most contextual logic in Alpaca happens at the lexer level.
Since the lexer tokenizes the entire input before the parser starts, the lexer context is the primary place to track state that affects tokenization.

### Exaple: Brace Matching & Nesting
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heading typo: "Exaple" should be "Example".

Suggested change
### Exaple: Brace Matching & Nesting
### Example: Brace Matching & Nesting

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +50
// id is a Lexeme, which has a .line property (and others) from the LexerCtx
println(s"Matched ID at line ${id.line}")
// fields contains all members of your LexerCtx
println(s"All context fields: ${id.fields}")
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lexeme.fields is private[alpaca] in the implementation, so id.fields won’t be accessible to library users. Consider removing this line and instead document that lexeme context is accessed via dynamic selectors (e.g., id.line, id.position, or other case-class fields present in the lexer context).

Suggested change
// id is a Lexeme, which has a .line property (and others) from the LexerCtx
println(s"Matched ID at line ${id.line}")
// fields contains all members of your LexerCtx
println(s"All context fields: ${id.fields}")
// id is a Lexeme; you can access members of your LexerCtx via dynamic selectors
println(s"Matched ID at line ${id.line}")
// e.g. if your LexerCtx defines `position`, you can access it as `id.position`

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +121
If you need complex logic to run after every match regardless of which token was matched, you can provide a custom `given` instance of `BetweenStages`.

```scala

trait CustomTrait extends LexerCtx:
var indentLevel: Int

case class CustomCtx(var text: CharSequence = "", var indentLevel: Int = 0) extends CustomTrait derives BetweenStages

given BetweenStages[CustomTrait] = new:
def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit =
// Custom logic to update indentLevel based on the matched token
token match
case Token["INDENT"](_) => ctx.indentLevel += 1
case Token["DEDENT"](_) => ctx.indentLevel -= 1
case _ => ()
```

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BetweenStages customization example appears unworkable for library users as written:

  • BetweenStages is declared private[alpaca] (src/alpaca/internal/lexer/BetweenStages.scala), so user code can’t name it to write given BetweenStages[...] or derives BetweenStages.
  • The sample also references MyCtx (undefined) and mixes CustomTrait/MyCtx types in the apply signature.
    Either adjust this section to describe the behavior without showing user-defined instances, or make BetweenStages a public, supported extension point and update the example to compile.
Suggested change
If you need complex logic to run after every match regardless of which token was matched, you can provide a custom `given` instance of `BetweenStages`.
```scala
trait CustomTrait extends LexerCtx:
var indentLevel: Int
case class CustomCtx(var text: CharSequence = "", var indentLevel: Int = 0) extends CustomTrait derives BetweenStages
given BetweenStages[CustomTrait] = new:
def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit =
// Custom logic to update indentLevel based on the matched token
token match
case Token["INDENT"](_) => ctx.indentLevel += 1
case Token["DEDENT"](_) => ctx.indentLevel -= 1
case _ => ()
```
The `BetweenStages` hook is currently an internal mechanism used by Alpaca to perform these automatic updates; it is not exposed as a stable public extension point. To implement custom behavior that should run after particular matches, encode that logic directly in your token definitions and `LexerCtx` (for example, by updating fields in your context inside token matchers or using additional stateful rules), rather than trying to provide your own `BetweenStages` instance.

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +64
case "\n" =>
ctx.line += 1
ctx.position = 1
Token.Ignored
case "." =>
ctx.position += 1
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example manually mutates ctx.line/ctx.position while also using LexerCtx.Default, which already mixes in LineTracking and PositionTracking (and those are applied via BetweenStages). As a result, line/position updates will be double-counted.
Consider either (1) removing the manual increments and just returning Token.Ignored/Token[...], or (2) switching the example to a context that does not extend the tracking traits (e.g., LexerCtx.Empty) if you want to demonstrate manual tracking.

Suggested change
case "\n" =>
ctx.line += 1
ctx.position = 1
Token.Ignored
case "." =>
ctx.position += 1
// `LexerCtx.Default` automatically updates `line` and `position`
case "\n" =>
Token.Ignored
case "." =>

Copilot uses AI. Check for mistakes.

- **Symbols**: The building blocks of your grammar. They can be:
- **Terminals**: Tokens from your lexer (e.g., `MyLexer.PLUS`).
- **Non-Terminals**: Other rules (e.g., `Expr[Double]`).
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This non-terminal example is misleading: Expr[Double] isn’t a valid way to reference a rule in Alpaca. Non-terminals are Rule[T] values (e.g., val Expr: Rule[Double] = ...) and are referenced in productions via the extractor form Expr(e).

Suggested change
- **Non-Terminals**: Other rules (e.g., `Expr[Double]`).
- **Non-Terminals**: Other rules (e.g., `Expr: Rule[Double]`).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants