Skip to content

Documentation: extractors tutorial and API docs#299

Open
halotukozak wants to merge 5 commits intomasterfrom
extractor-docs
Open

Documentation: extractors tutorial and API docs#299
halotukozak wants to merge 5 commits intomasterfrom
extractor-docs

Conversation

@halotukozak
Copy link
Copy Markdown
Owner

Summary

  • Add tutorial on extractors in Alpaca documentation
  • Document BetweenStages hook in contextual parsing guide
  • Add guides on contextual parsing and conflict resolution
  • Add comprehensive Lexer and Parser API documentation

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 4, 2026 14:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Alpaca’s documentation set by adding new tutorials/guides and standalone API docs for the Lexer and Parser, aiming to help users understand extractors, contextual parsing, conflict resolution, and core APIs.

Changes:

  • Added a tutorial explaining token/rule extractors and EBNF helpers (.List, .Option).
  • Added new guides for contextual parsing (including BetweenStages) and conflict resolution.
  • Added dedicated API documentation pages for the Lexer and Parser.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
docs/_docs/tutorials/extractors.md New tutorial describing extractor patterns for tokens/rules and EBNF helpers.
docs/_docs/lexer.md New Lexer API documentation with examples for defining lexers, contexts, and tokenization.
docs/_docs/parser.md New Parser API documentation including grammar rules, EBNF operators, conflict resolution, and parsing flow.
docs/_docs/guides/contextual-parsing.md New guide describing lexer/parser context usage and the BetweenStages hook.
docs/_docs/guides/conflict-resolution.md New guide explaining LR conflicts and Alpaca’s before/after resolution DSL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +20 to +22
case MyLexer.NUM(n) => n.value // n is a Lexem object
```
The `Lexem` object contains:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, n is a Lexeme, but the text calls it a Lexem. This typo is repeated in the following sentence and could confuse readers searching for the type in the API.

Copilot uses AI. Check for mistakes.

```scala
val Decl: Rule[Val] = rule:
case (MyLexer.VAL(_), MyLexer.ID(id), MyLexer.Type.Option(t) => ...
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This .Option example snippet is syntactically invalid (missing closing ) / =>) and likely references the wrong symbol (MyLexer.Type doesn’t match the token/rule naming used elsewhere). Please fix the snippet so it compiles and demonstrates .Option on an actual symbol (e.g., a Rule’s .Option).

Suggested change
case (MyLexer.VAL(_), MyLexer.ID(id), MyLexer.Type.Option(t) => ...
case (MyLexer.VAL(_), MyLexer.ID(id), Expr.Option(optExpr)) => ...

Copilot uses AI. Check for mistakes.
Most contextual logic in Alpaca happens at the lexer level.
Since the lexer tokenizes the entire input before the parser starts, the lexer context is the primary place to track state that affects tokenization.

### Exaple: Brace Matching & Nesting
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heading typo: “Exaple” → “Example”.

Suggested change
### Exaple: Brace Matching & Nesting
### Example: Brace Matching & Nesting

Copilot uses AI. Check for mistakes.
Token["("]
case "\\)" =>
if ctx.stack.isEmpty || ctx.stack.pop() != "paren" then
throw RuntimeException("Mismatched parenthesis")
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw RuntimeException("Mismatched parenthesis") is not valid Scala; it should be throw new RuntimeException(...) (or a more specific exception). As written, the example won’t compile.

Suggested change
throw RuntimeException("Mismatched parenthesis")
throw new RuntimeException("Mismatched parenthesis")

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +64
The `lexer` block uses `LexerCtx.Default` by default, which tracks `line` and `position`.

```scala
val myLexer = lexer:
case "\n" =>
ctx.line += 1
ctx.position = 1
Token.Ignored
case "." =>
ctx.position += 1
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Default Context” example manually increments ctx.line / ctx.position, but LexerCtx.Default already mixes in LineTracking and PositionTracking, whose BetweenStages hooks update these counters automatically after each match. As written, the example will double-increment on newlines / characters; either remove the manual updates or show this with a custom context that doesn’t include those tracking traits.

Suggested change
The `lexer` block uses `LexerCtx.Default` by default, which tracks `line` and `position`.
```scala
val myLexer = lexer:
case "\n" =>
ctx.line += 1
ctx.position = 1
Token.Ignored
case "." =>
ctx.position += 1
The `lexer` block uses `LexerCtx.Default` by default, which tracks `line` and `position` automatically as input is consumed.
```scala
val myLexer = lexer:
case "\n" =>
// Newlines are ignored; line and position are updated automatically
Token.Ignored
case "." =>

Copilot uses AI. Check for mistakes.

- **Symbols**: The building blocks of your grammar. They can be:
- **Terminals**: Tokens from your lexer (e.g., `MyLexer.PLUS`).
- **Non-Terminals**: Other rules (e.g., `Expr[Double]`).
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the “Symbols” section, Expr[Double] is not valid Scala syntax for referring to a rule; it reads like a type application. Consider changing this example to something like val Expr: Rule[Double] = ... (or just Expr) to avoid confusing readers.

Suggested change
- **Non-Terminals**: Other rules (e.g., `Expr[Double]`).
- **Non-Terminals**: Other rules (e.g., `Expr`).

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +25
- `value`: The extracted value (e.g., `Double`, `Int`, `String`).
- `name`: The name of the token.
- `fields`: A NamedTuple containing context information (like `line` and `position`).
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say fields is a NamedTuple, but Lexeme.fields is implemented as a Map[String, Any] (see src/alpaca/internal/lexer/Lexeme.scala). Please update this description to match the actual API (and consider noting that field access is via Selectable, e.g. n.line).

Copilot uses AI. Check for mistakes.

case class BraceCtx(
var text: CharSequence = "",
stack: mutable.Stack[String] = mutable.Stack()
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet won’t compile as written: it imports scala.collection.mutable.Stack but then uses mutable.Stack without importing scala.collection.mutable (or aliasing it). Either use Stack[String] / Stack() in the case class, or import scala.collection.mutable and refer to mutable.Stack.

Suggested change
stack: mutable.Stack[String] = mutable.Stack()
stack: Stack[String] = Stack()

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +120
If you need complex logic to run after every match regardless of which token was matched, you can provide a custom `given` instance of `BetweenStages`.

```scala

trait CustomTrait extends LexerCtx:
var indentLevel: Int

case class CustomCtx(var text: CharSequence = "", var indentLevel: Int = 0) extends CustomTrait derives BetweenStages

given BetweenStages[CustomTrait] = new:
def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit =
// Custom logic to update indentLevel based on the matched token
token match
case Token["INDENT"](_) => ctx.indentLevel += 1
case Token["DEDENT"](_) => ctx.indentLevel -= 1
case _ => ()
```
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Customizing BetweenStages” example is not currently usable from normal user code: BetweenStages is declared private[alpaca] (src/alpaca/internal/lexer/BetweenStages.scala), and derives BetweenStages won’t work (there’s no BetweenStages.derived). Additionally, the snippet references an undefined MyCtx and matches on Token["INDENT"], but Token[...] is compile-time-only DSL syntax, not a runtime pattern for BetweenStages. Please either rework this section to reflect the actual supported customization mechanism, or make BetweenStages a public, user-extensible API.

Suggested change
If you need complex logic to run after every match regardless of which token was matched, you can provide a custom `given` instance of `BetweenStages`.
```scala
trait CustomTrait extends LexerCtx:
var indentLevel: Int
case class CustomCtx(var text: CharSequence = "", var indentLevel: Int = 0) extends CustomTrait derives BetweenStages
given BetweenStages[CustomTrait] = new:
def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit =
// Custom logic to update indentLevel based on the matched token
token match
case Token["INDENT"](_) => ctx.indentLevel += 1
case Token["DEDENT"](_) => ctx.indentLevel -= 1
case _ => ()
```
At the moment, `BetweenStages` is an internal implementation detail of Alpaca and is not intended to be customized directly from user code.
The library provides a default `BetweenStages` implementation that:
- advances the `text` field in your `LexerCtx` after each successful match, and
- updates any tracking fields (for example, line/column or position counters) when your context mixes in the appropriate traits.
If you need additional logic to run in response to specific tokens, you should implement it using the public API, for example by:
- mutating your `LexerCtx` inside lexer rule actions, or
- post-processing the produced `Lexeme`s (and their captured context) after lexing.
Future versions of Alpaca may expose `BetweenStages` as a public, user-extensible hook. When that happens, this guide will be updated with a concrete customization example that reflects the supported API.

Copilot uses AI. Check for mistakes.

### Automatic Updates
By default, Alpaca uses `BetweenStages` to automatically update the `text` field in your context (to advance past the matched string).
If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters.
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: “hooks also increments” → “hooks also increment”.

Suggested change
If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters.
If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increment `line` and `position` counters.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants