Documentation: extractors tutorial and API docs by halotukozak · Pull Request #299 · halotukozak/alpaca

halotukozak · 2026-03-04T14:49:33Z

Summary

Add tutorial on extractors in Alpaca documentation
Document BetweenStages hook in contextual parsing guide
Add guides on contextual parsing and conflict resolution
Add comprehensive Lexer and Parser API documentation

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR expands Alpaca’s documentation set by adding new tutorials/guides and standalone API docs for the Lexer and Parser, aiming to help users understand extractors, contextual parsing, conflict resolution, and core APIs.

Changes:

Added a tutorial explaining token/rule extractors and EBNF helpers (.List, .Option).
Added new guides for contextual parsing (including BetweenStages) and conflict resolution.
Added dedicated API documentation pages for the Lexer and Parser.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
docs/_docs/tutorials/extractors.md	New tutorial describing extractor patterns for tokens/rules and EBNF helpers.
docs/_docs/lexer.md	New Lexer API documentation with examples for defining lexers, contexts, and tokenization.
docs/_docs/parser.md	New Parser API documentation including grammar rules, EBNF operators, conflict resolution, and parsing flow.
docs/_docs/guides/contextual-parsing.md	New guide describing lexer/parser context usage and the `BetweenStages` hook.
docs/_docs/guides/conflict-resolution.md	New guide explaining LR conflicts and Alpaca’s `before`/`after` resolution DSL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T14:57:46Z

docs/_docs/tutorials/extractors.md

+case MyLexer.NUM(n) => n.value // n is a Lexem object
+```
+The `Lexem` object contains:


In this example, n is a Lexeme, but the text calls it a Lexem. This typo is repeated in the following sentence and could confuse readers searching for the type in the API.

Copilot · 2026-03-04T14:57:47Z

docs/_docs/tutorials/extractors.md

+
+```scala
+val Decl: Rule[Val] = rule:
+  case (MyLexer.VAL(_), MyLexer.ID(id), MyLexer.Type.Option(t) => ...


This .Option example snippet is syntactically invalid (missing closing ) / =>) and likely references the wrong symbol (MyLexer.Type doesn’t match the token/rule naming used elsewhere). Please fix the snippet so it compiles and demonstrates .Option on an actual symbol (e.g., a Rule’s .Option).

Suggested change

case (MyLexer.VAL(_), MyLexer.ID(id), MyLexer.Type.Option(t) => ...

case (MyLexer.VAL(_), MyLexer.ID(id), Expr.Option(optExpr)) => ...

Copilot · 2026-03-04T14:57:47Z

docs/_docs/guides/contextual-parsing.md

+Most contextual logic in Alpaca happens at the lexer level.
+Since the lexer tokenizes the entire input before the parser starts, the lexer context is the primary place to track state that affects tokenization.
+
+### Exaple: Brace Matching & Nesting


Heading typo: “Exaple” → “Example”.

Suggested change

### Exaple: Brace Matching & Nesting

### Example: Brace Matching & Nesting

Copilot · 2026-03-04T14:57:47Z

docs/_docs/guides/contextual-parsing.md

+    Token["("]
+  case "\\)" =>
+    if ctx.stack.isEmpty || ctx.stack.pop() != "paren" then
+      throw RuntimeException("Mismatched parenthesis")


throw RuntimeException("Mismatched parenthesis") is not valid Scala; it should be throw new RuntimeException(...) (or a more specific exception). As written, the example won’t compile.

Suggested change

throw RuntimeException("Mismatched parenthesis")

throw new RuntimeException("Mismatched parenthesis")

Copilot · 2026-03-04T14:57:48Z

docs/_docs/lexer.md

+The `lexer` block uses `LexerCtx.Default` by default, which tracks `line` and `position`.
+
+```scala
+val myLexer = lexer:
+  case "\n" => 
+    ctx.line += 1
+    ctx.position = 1
+    Token.Ignored
+  case "." => 
+    ctx.position += 1


The “Default Context” example manually increments ctx.line / ctx.position, but LexerCtx.Default already mixes in LineTracking and PositionTracking, whose BetweenStages hooks update these counters automatically after each match. As written, the example will double-increment on newlines / characters; either remove the manual updates or show this with a custom context that doesn’t include those tracking traits.

Suggested change

The `lexer` block uses `LexerCtx.Default` by default, which tracks `line` and `position`.

```scala

val myLexer = lexer:

case "\n" =>

ctx.line += 1

ctx.position = 1

Token.Ignored

case "." =>

ctx.position += 1

The `lexer` block uses `LexerCtx.Default` by default, which tracks `line` and `position` automatically as input is consumed.

```scala

val myLexer = lexer:

case "\n" =>

// Newlines are ignored; line and position are updated automatically

Token.Ignored

case "." =>

Copilot · 2026-03-04T14:57:48Z

docs/_docs/parser.md

+
+- **Symbols**: The building blocks of your grammar. They can be:
+    - **Terminals**: Tokens from your lexer (e.g., `MyLexer.PLUS`).
+    - **Non-Terminals**: Other rules (e.g., `Expr[Double]`).


In the “Symbols” section, Expr[Double] is not valid Scala syntax for referring to a rule; it reads like a type application. Consider changing this example to something like val Expr: Rule[Double] = ... (or just Expr) to avoid confusing readers.

Suggested change

- **Non-Terminals**: Other rules (e.g., `Expr[Double]`).

- **Non-Terminals**: Other rules (e.g., `Expr`).

Copilot · 2026-03-04T14:57:48Z

docs/_docs/tutorials/extractors.md

+- `value`: The extracted value (e.g., `Double`, `Int`, `String`).
+- `name`: The name of the token.
+- `fields`: A NamedTuple containing context information (like `line` and `position`).


The docs say fields is a NamedTuple, but Lexeme.fields is implemented as a Map[String, Any] (see src/alpaca/internal/lexer/Lexeme.scala). Please update this description to match the actual API (and consider noting that field access is via Selectable, e.g. n.line).

Copilot · 2026-03-04T14:57:48Z

docs/_docs/guides/contextual-parsing.md

+
+case class BraceCtx(
+  var text: CharSequence = "",
+  stack: mutable.Stack[String] = mutable.Stack()


This snippet won’t compile as written: it imports scala.collection.mutable.Stack but then uses mutable.Stack without importing scala.collection.mutable (or aliasing it). Either use Stack[String] / Stack() in the case class, or import scala.collection.mutable and refer to mutable.Stack.

Suggested change

stack: mutable.Stack[String] = mutable.Stack()

stack: Stack[String] = Stack()

Copilot · 2026-03-04T14:57:49Z

docs/_docs/guides/contextual-parsing.md

+If you need complex logic to run after every match regardless of which token was matched, you can provide a custom `given` instance of `BetweenStages`.
+
+```scala
+
+trait CustomTrait extends LexerCtx:
+  var indentLevel: Int
+
+case class CustomCtx(var text: CharSequence = "", var indentLevel: Int = 0) extends CustomTrait derives BetweenStages
+
+given BetweenStages[CustomTrait] = new:
+  def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit =
+    // Custom logic to update indentLevel based on the matched token
+    token match
+      case Token["INDENT"](_) => ctx.indentLevel += 1
+      case Token["DEDENT"](_) => ctx.indentLevel -= 1
+      case _ => ()
+```


The “Customizing BetweenStages” example is not currently usable from normal user code: BetweenStages is declared private[alpaca] (src/alpaca/internal/lexer/BetweenStages.scala), and derives BetweenStages won’t work (there’s no BetweenStages.derived). Additionally, the snippet references an undefined MyCtx and matches on Token["INDENT"], but Token[...] is compile-time-only DSL syntax, not a runtime pattern for BetweenStages. Please either rework this section to reflect the actual supported customization mechanism, or make BetweenStages a public, user-extensible API.

Suggested change

If you need complex logic to run after every match regardless of which token was matched, you can provide a custom `given` instance of `BetweenStages`.

```scala

trait CustomTrait extends LexerCtx:

var indentLevel: Int

case class CustomCtx(var text: CharSequence = "", var indentLevel: Int = 0) extends CustomTrait derives BetweenStages

given BetweenStages[CustomTrait] = new:

def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit =

// Custom logic to update indentLevel based on the matched token

token match

case Token["INDENT"](_) => ctx.indentLevel += 1

case Token["DEDENT"](_) => ctx.indentLevel -= 1

case _ => ()

```

At the moment, `BetweenStages` is an internal implementation detail of Alpaca and is not intended to be customized directly from user code.

The library provides a default `BetweenStages` implementation that:

- advances the `text` field in your `LexerCtx` after each successful match, and

- updates any tracking fields (for example, line/column or position counters) when your context mixes in the appropriate traits.

If you need additional logic to run in response to specific tokens, you should implement it using the public API, for example by:

- mutating your `LexerCtx` inside lexer rule actions, or

- post-processing the produced `Lexeme`s (and their captured context) after lexing.

Future versions of Alpaca may expose `BetweenStages` as a public, user-extensible hook. When that happens, this guide will be updated with a concrete customization example that reflects the supported API.

Copilot · 2026-03-04T14:57:49Z

docs/_docs/guides/contextual-parsing.md

+
+### Automatic Updates
+By default, Alpaca uses `BetweenStages` to automatically update the `text` field in your context (to advance past the matched string).
+If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters.


Grammar: “hooks also increments” → “hooks also increment”.

Suggested change

If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters.

If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increment `line` and `position` counters.

halotukozak added 5 commits February 16, 2026 21:39

Add comprehensive documentation for Lexer and Parser APIs in Alpaca

67f9cff

Add detailed guide on conflict resolution in Alpaca parsers

95fb170

Add guide on contextual parsing in Alpaca lexers and parsers

8d0b165

Document BetweenStages hook in contextual parsing guide

21e8036

Add tutorial on extractors in Alpaca documentation

f98e1bd

Copilot AI review requested due to automatic review settings March 4, 2026 14:49

Copilot started reviewing on behalf of halotukozak March 4, 2026 14:50 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation: extractors tutorial and API docs#299

Documentation: extractors tutorial and API docs#299
halotukozak wants to merge 5 commits intomasterfrom
extractor-docs

halotukozak commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	case (MyLexer.VAL(_), MyLexer.ID(id), MyLexer.Type.Option(t) => ...
	case (MyLexer.VAL(_), MyLexer.ID(id), Expr.Option(optExpr)) => ...

	### Exaple: Brace Matching & Nesting
	### Example: Brace Matching & Nesting

	throw RuntimeException("Mismatched parenthesis")
	throw new RuntimeException("Mismatched parenthesis")

	- Non-Terminals: Other rules (e.g., `Expr[Double]`).
	- Non-Terminals: Other rules (e.g., `Expr`).

	stack: mutable.Stack[String] = mutable.Stack()
	stack: Stack[String] = Stack()

	If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increments `line` and `position` counters.
	If your context extends `LineTracking` or `PositionTracking`, the defined hooks also increment `line` and `position` counters.

Conversation

halotukozak commented Mar 4, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants