halotukozak · halotukozak · Feb 21, 2026 · Feb 21, 2026 · Feb 21, 2026 · Feb 21, 2026
diff --git a/docs/_docs/cookbook/error-messages.md b/docs/_docs/cookbook/error-messages.md
@@ -0,0 +1,93 @@
+# Error Messages
+
+Alpaca surfaces errors at three distinct points -- compile time, lex time, and parse time -- each with different behavior and handling strategies.
+
+> **Compile-time processing:** The `lexer` block is a Scala 3 macro. `ShadowException`, invalid regex patterns, and unsupported guards are all detected at compile time and reported as compiler errors, not runtime exceptions. These errors cannot be caught with `try`/`catch` -- they prevent compilation entirely.
+
+## Compile-Time Errors
+
+Compile-time errors are emitted by the Alpaca macro when it processes your `lexer` or `parser` definition. They appear as ordinary compiler errors in your IDE or build output. Because they occur at compile time, there is no way to handle them at runtime -- you must fix the definition and recompile.
+
+### ShadowException
+
+A `ShadowException` occurs when an earlier pattern always matches everything a later pattern would match, making the later pattern unreachable. The macro performs pairwise regex inclusion checks and fails compilation if any pattern is shadowed.
+
+```scala sc:nocompile
+import alpaca.*
+
+// This does NOT compile -- ShadowException
+val BadLexer = lexer:
+  case "[a-zA-Z_][a-zA-Z0-9_]*" => Token["IDENTIFIER"]  // general pattern
+  case "[a-zA-Z]+"               => Token["ALPHABETIC"]   // ERROR: shadowed by IDENTIFIER
+
+// Fix: more-specific patterns before more-general ones
+val GoodLexer = lexer:
+  case "if"                      => Token["IF"]           // keyword first
+  case "[a-zA-Z_][a-zA-Z0-9_]*" => Token["IDENTIFIER"]   // general pattern last
+  case "\\s+" => Token.Ignored
+```
+
+The compile error reads: `Pattern [a-zA-Z]+ is shadowed by [a-zA-Z_][a-zA-Z0-9_]*`. The fix is always the same: move the more specific pattern before the more general one.
+
+### Guards Are Not Supported
+
+Scala pattern guards (`case "regex" if condition =>`) are not supported in lexer rule definitions. Using one produces a compile-time error:
+
+```scala sc:nocompile
+import alpaca.*
+
+// WRONG -- compile error: "Guards are not supported yet"
+case class MyCtx(var text: CharSequence = "", var flag: Boolean = false) extends LexerCtx
+val GuardedLexer = lexer[MyCtx]:
+  case "token" if ctx.flag => Token["A"]
+
+// Fix: move the condition inside the rule body
+val CorrectLexer = lexer[MyCtx]:
+  case "token" =>
+    if ctx.flag then Token["A"] else Token["B"]
+```
+
+## Runtime Lexer Errors
+
+If `tokenize()` encounters a character that does not match any pattern, it throws a `RuntimeException` immediately. There is no skip-and-continue behavior -- lexing stops at the first unrecognized character and the exception propagates to the caller.
+
+```scala sc:nocompile
+import alpaca.*
+
+val NumLexer = lexer:
+  case num @ "[0-9]+" => Token["NUM"](num.toInt)
+  case "\\s+" => Token.Ignored
+
+try
+  val (_, lexemes) = NumLexer.tokenize("42 abc")
+catch
+  case e: RuntimeException =>
+    println(e.getMessage)  // "Unexpected character: 'a'"
+```
+
+The exception message contains the unexpected character but not its position in the input. For position information, use a context that tracks position -- see [Lexer Context](../lexer-context.html).
+
+There is no custom error handler API yet ([GitHub issue #21](https://github.com/bkozak-scancode/alpaca/issues/21) is open).
+
+## Parser Failure
+
+`parse()` returns `T | Null`. A `null` result means the input token sequence did not match the grammar. This is not an exception -- it is a normal return value. Always check for `null` before using the result.
+
+```scala sc:nocompile
+import alpaca.*
+
+val (_, lexemes) = CalcLexer.tokenize("1 + + 2")
+val (_, result) = CalcParser.parse(lexemes)
+if result == null then
+  println("Parse failed: input did not match the grammar")
+else
+  println(s"Result: $result")
+```
+
+There is no structured parser error reporting yet -- `null` is the only signal that parsing failed ([GitHub issue #51](https://github.com/bkozak-scancode/alpaca/issues/51), [#65](https://github.com/bkozak-scancode/alpaca/issues/65) are open).
+
+## See Also
+
+- [Lexer Error Recovery](../lexer-error-recovery.html) -- full reference: `ShadowException`, runtime errors, pattern ordering
+- [Lexer Context](../lexer-context.html) -- `PositionTracking` and `LineTracking` for position-aware error reporting
+- [Parser](../parser.html) -- `parse()` return type, `T | Null` contract
diff --git a/docs/_docs/cookbook/expression-evaluator.md b/docs/_docs/cookbook/expression-evaluator.md
@@ -0,0 +1,84 @@
+# Expression Evaluator
+
+Alpaca's `before`/`after` DSL resolves operator precedence conflicts in the LR parse table at compile time, letting you build a fully evaluated expression parser with correct precedence and associativity.
+
+> **Compile-time processing:** When you declare `override val resolutions = Set(...)`, the Alpaca macro bakes your precedence rules directly into the LR(1) parse table during compilation. No precedence checks happen at runtime -- the parser executes deterministically from a pre-resolved table.
+
+## The Problem
+
+Arithmetic grammars are ambiguous without explicit precedence declarations. The expression `1 + 2 * 3` can parse as `(1 + 2) * 3 = 9` or `1 + (2 * 3) = 11`, and the LR algorithm cannot choose between them on its own. Alpaca reports these as shift/reduce conflicts at compile time and gives you the `before`/`after` DSL to resolve them by declaring which productions take priority.
+
+## Define the Lexer
+
+```scala sc:nocompile
+import alpaca.*
+
+val CalcLexer = lexer:
+  case num @ "[0-9]+(\\.[0-9]+)?" => Token["NUMBER"](num.toDouble)
+  case "\\+"  => Token["PLUS"]
+  case "-"    => Token["MINUS"]
+  case "\\*"  => Token["TIMES"]
+  case "/"    => Token["DIVIDE"]
+  case "\\("  => Token["LPAREN"]
+  case "\\)"  => Token["RPAREN"]
+  case "\\s+" => Token.Ignored
+```
+
+The regex `[0-9]+(\.[0-9]+)?` matches both integers and decimals. `num.toDouble` converts the matched string to a `Double`, so `Token["NUMBER"]` carries a `Double` value -- this is what makes `Rule[Double]` the right type for the parser.
+
+## Define the Parser
+
+```scala sc:nocompile
+import alpaca.*
+
+object CalcParser extends Parser:
+  val Expr: Rule[Double] = rule(
+    "plus"  { case (Expr(a), CalcLexer.PLUS(_),   Expr(b)) => a + b },
+    "minus" { case (Expr(a), CalcLexer.MINUS(_),  Expr(b)) => a - b },
+    "times" { case (Expr(a), CalcLexer.TIMES(_),  Expr(b)) => a * b },
+    "div"   { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b },
+    { case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e },
-    { case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e },
+    { case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },
-    { case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e },
+    { case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },
+    { case CalcLexer.NUMBER(n) => n.value },
+  )
+  val root: Rule[Double] = rule:
+    case Expr(e) => e
+
+  override val resolutions = Set(
+    production.plus.before(CalcLexer.PLUS, CalcLexer.MINUS),
+    production.plus.after(CalcLexer.TIMES, CalcLexer.DIVIDE),
+    production.minus.before(CalcLexer.PLUS, CalcLexer.MINUS),
+    production.minus.after(CalcLexer.TIMES, CalcLexer.DIVIDE),
+    production.times.before(CalcLexer.TIMES, CalcLexer.DIVIDE, CalcLexer.PLUS, CalcLexer.MINUS),
+    production.div.before(CalcLexer.TIMES, CalcLexer.DIVIDE, CalcLexer.PLUS, CalcLexer.MINUS),
+  )
+```
+
+Reading `production.plus.before(CalcLexer.PLUS, CalcLexer.MINUS)`: when the parser has reduced the `plus` production and the next token is `+` or `-`, prefer the reduction. This gives `+` left associativity and equal precedence with `-`.
+
+Reading `production.plus.after(CalcLexer.TIMES, CalcLexer.DIVIDE)`: when the conflict is between reducing `plus` and shifting `*` or `/`, prefer shifting. This makes `*` and `/` bind tighter.
+
+## Run It
+
+```scala sc:nocompile
+import alpaca.*
+
+val (_, lexemes) = CalcLexer.tokenize("3 + 4 * 2")
+val (_, result)  = CalcParser.parse(lexemes)
+// result: Double | Null  --  11.0 (not 14.0, because * binds tighter than +)
+```
+
+Always check for `null` before using the result -- `null` means the input did not match the grammar.
+
+## Key Points
+
+- `Rule[Double]` because `NUMBER` yields `Double` (`num.toDouble` in the lexer).
+- `n.value` extracts the `Double` from the matched lexeme -- `n` is a `Lexeme`, not a `Double` directly.
+- `resolutions` must be the **last `val`** in the parser object -- the macro reads top-to-bottom and must have seen all rule declarations before processing `resolutions`.
+- Use `before`/`after` (not `alwaysBefore`/`alwaysAfter` -- the compiler error message suggests those names but they do not exist in the API).
+- `production` is a `@compileTimeOnly` compile-time construct: valid only inside the `resolutions` value.
+
+## See Also
+
+- [Conflict Resolution](../conflict-resolution.html) -- `before`/`after` DSL reference, `Production(symbols*)` selector, token-side resolution
+- [Parser](../parser.html) -- rule syntax, `root` requirement, `Rule[T]` types
+- [Lexer](../lexer.html) -- token definition, `Token["NAME"](value)` constructor
diff --git a/docs/_docs/cookbook/multi-pass.md b/docs/_docs/cookbook/multi-pass.md
@@ -0,0 +1,74 @@
+# Multi-Pass Processing
+
+Alpaca has no dedicated multi-pass API; multi-pass is a composition pattern -- tokenize the input with a first lexer, transform the resulting `List[Lexeme]` in plain Scala, then parse or re-lex as needed.
+
+> **Compile-time processing:** Both the lexer and parser macros are compiled independently; the `List[Lexeme]` boundary between them is an ordinary runtime value you can inspect and transform with any Scala collection operations.
+
+## The Pattern
+
+`tokenize()` returns a named tuple `(ctx, lexemes: List[Lexeme])`; `lexemes` is an ordinary `List` you can `filter`, `map`, or chain to a second stage.
+`parse()` accepts any `List[Lexeme]` directly -- the type refinement is widened at the call site, so filtered or re-ordered lists are compatible without any casting.
+Each `Lexeme` has a `name: String` field (the token name) and a `value: Any` field (the extracted value) that you can inspect during transformation.
+
+Important constraint: the `Lexeme` constructor is private to the `alpaca` package. You cannot create new `Lexeme` instances. Multi-pass works by transforming the list of existing lexemes -- filter, reorder, or re-lex string values using a second lexer call.
+
+## Example: Comment Stripping
+
+The most common multi-pass pattern: lex input that contains comments, strip the comment tokens from the list, then parse the clean token stream.
+
+```scala sc:nocompile
+import alpaca.*
+
+// Stage 1: lex with comments
+val Stage1 = lexer:
+  case "#.*" => Token["COMMENT"]
+  case num @ "[0-9]+" => Token["NUM"](num.toInt)
+  case "\\+" => Token["PLUS"]
+  case "\\s+" => Token.Ignored
+
+object SumParser extends Parser:
+  val Sum: Rule[Int] = rule(
+    { case (Sum(a), Stage1.PLUS(_), Sum(b)) => a + b },
+    { case Stage1.NUM(n) => n.value.asInstanceOf[Int] },
-    { case Stage1.NUM(n) => n.value.asInstanceOf[Int] },
+    { case Stage1.NUM(n) => n.value },
-    { case Stage1.NUM(n) => n.value.asInstanceOf[Int] },
+    { case Stage1.NUM(n) => n.value },
+  )
+  val root: Rule[Int] = rule:
+    case Sum(s) => s
+
+// Multi-pass: lex, filter, parse
+val (_, stage1Lexemes) = Stage1.tokenize("1 + # ignore this\n2")
+val filtered = stage1Lexemes.filter(_.name != "COMMENT")
+val (_, result) = SumParser.parse(filtered)
+// result: Int | Null  -- 3
+```
+
+## Example: Re-Lexing Values
+
+For more advanced cases, string values extracted from stage 1 tokens can be tokenized again by a second lexer and then `flatMap`-ed back into the stream:
+
+```scala sc:nocompile
+import alpaca.*
+
+// Advanced: re-lex string values extracted from stage 1 tokens
+val (_, tokens) = IdentLexer.tokenize(source)
+val expanded = tokens.flatMap:
+  case lex if lex.name == "MACRO" =>
+    MacroLexer.tokenize(expandMacro(lex.value.asInstanceOf[String])).lexemes
+  case lex => List(lex)
+val (_, result) = MainParser.parse(expanded)
+```
+
+`lex.value` is `Any`; cast to the expected type. `expandMacro` is application-defined.
+
+## Key Points
+
+- `tokenize()` returns `(ctx, lexemes)`; use `.lexemes` or destructure to get the `List[Lexeme]`
+- `Lexeme.name` is a `String` (the token name), `Lexeme.value` is `Any` (the extracted value)
+- The `Lexeme` constructor is private -- you cannot construct new `Lexeme` instances; work with the existing list
+- `parse()` accepts any `List[Lexeme[?, ?]]`; the type refinement is widened at the call site
+- `Token.Ignored` tokens produce no lexemes and are never in the list
+
+## See Also
+
+- [Between Stages](../between-stages.html) -- Lexeme structure, `tokenize()` return type, `BetweenStages` hook
+- [Lexer](../lexer.html) -- `tokenize()` API, `Token["NAME"](value)` constructor
+- [Parser](../parser.html) -- `parse()` API, `Rule[T]` types
diff --git a/docs/_docs/cookbook/whitespace-sensitive.md b/docs/_docs/cookbook/whitespace-sensitive.md
@@ -0,0 +1,80 @@
+# Whitespace-Sensitive Lexing
+
+Use a custom `LexerCtx` to track indentation depth and emit `INDENT` or `DEDENT` tokens when the indentation level changes between lines.
+
+> **Compile-time processing:** The `lexer[MyCtx]` macro inspects `MyCtx` at compile time; it auto-composes `BetweenStages` hooks from parent traits; the `ctx` value is available in rule bodies as a compile-time alias that is replaced by field accesses in the generated code.
+
+## The LexerCtx Contract
+
+A valid custom context must satisfy three rules:
+
+1. It must be a **case class** -- `LexerCtx` has a `this: Product =>` self-type; the auto-derivation machinery requires a `Product` instance and regular classes do not satisfy it.
+2. It must include **`var text: CharSequence = ""`** -- `LexerCtx` declares this field as abstract; omitting it produces a compile error.
+3. **All fields must have default values** -- the `Empty[T]` derivation macro reads default parameter values from the companion object to construct the initial context.
+
+Mutable state fields must be `var` so the lexer can assign to them directly.
+
+## Tracking Indentation
+
+Define a context with `currentIndent` and `prevIndent` fields; when a newline followed by spaces is matched, count the spaces to determine the new indentation level and compare it against the previous level.
+Guards are not supported in lexer rules (`case "regex" if condition =>` is a compile error); check the condition inside the rule body instead.
+Emit `Token["INDENT"]` when indentation increases, `Token["DEDENT"]` when it decreases, and `Token.Ignored` when it stays the same.
+
+```scala sc:nocompile
+import alpaca.*
+
+case class IndentCtx(
+  var text: CharSequence = "",
+  var currentIndent: Int = 0,
+  var prevIndent: Int = 0,
+) extends LexerCtx
+
+val IndentLexer = lexer[IndentCtx]:
+  case "\\n( *)" =>
+    val newIndent = ctx.text.toString.count(_ == ' ')
+    val prev = ctx.prevIndent
+    ctx.prevIndent = newIndent
+    ctx.currentIndent = newIndent
+    // Guards are not supported -- check condition in body
+    if newIndent > prev then Token["INDENT"](newIndent)
+    else if newIndent < prev then Token["DEDENT"](newIndent)
+    else Token.Ignored
+  case word @ "[a-z_][a-z0-9_]*" => Token["WORD"](word)
+  case "\\s+" => Token.Ignored
+```
+
+The `\\n( *)` pattern matches a newline followed by zero or more spaces.
+`ctx.text` contains the full match text at the time the rule body runs; counting spaces in it gives the new indentation level.
+`Token["INDENT"](newIndent)` and `Token["DEDENT"](newIndent)` carry the new depth as their value, which the parser can read.
+Because guards are not supported, the `if`/`else` is inside the rule body rather than after the pattern.
+
+## Reading INDENT and DEDENT in the Parser
+
+The parser sees `INDENT` and `DEDENT` tokens in the lexeme list just like any other token.
+Use `IndentLexer.INDENT(n)` to extract the new depth from the lexeme value -- `n` is a `Lexeme` and `n.value` is the `Int` depth passed to `Token["INDENT"](newIndent)`.
+
+```scala sc:nocompile
+import alpaca.*
+
+object IndentParser extends Parser:
+  val Block: Rule[List[String]] = rule(
+    { case (IndentLexer.INDENT(_), Block(inner), IndentLexer.DEDENT(_)) => inner },
+    { case IndentLexer.WORD(w) => List(w.value.asInstanceOf[String]) },
-    { case IndentLexer.WORD(w) => List(w.value.asInstanceOf[String]) },
+    { case IndentLexer.WORD(w) => List(w.value) },
-    { case IndentLexer.WORD(w) => List(w.value.asInstanceOf[String]) },
+    { case IndentLexer.WORD(w) => List(w.value) },
+  )
+  val root: Rule[List[String]] = rule:
+    case Block(b) => b
+```
+
+## Key Points
+
+- `case class` with `var text: CharSequence = ""` is mandatory; other mutable fields must be `var`
+- Guards (`case "regex" if condition =>`) are a compile error; move conditions into the rule body
+- `ctx` is a compile-time construct available only inside lexer rule bodies
+- `Token["INDENT"](value)` and `Token["DEDENT"](value)` are distinct named token types; the value carries the new depth for use in the parser
+- `Token.Ignored` produces no lexeme; the newline pattern emits either `INDENT`, `DEDENT`, or nothing depending on the depth change
+
+## See Also
+
+- [Lexer Context](../lexer-context.html) -- full `LexerCtx` reference: case class contract, `BetweenStages`, `PositionTracking`, `LineTracking`
+- [Lexer Error Recovery](../lexer-error-recovery.html) -- guards limitation and workaround
+- [Lexer](../lexer.html) -- `Token["NAME"](value)` constructor, `Token.Ignored`
diff --git a/docs/_docs/extractors.md b/docs/_docs/extractors.md
@@ -19,18 +19,18 @@ Token names that are not valid Scala identifiers (containing `+`, `(`, `)`, rese
 import alpaca.*
 
 // Value-bearing token: use binding.value for the semantic content
-{ case CalcLexer.NUMBER(n) => n.value }     // n: Lexeme, n.value: Int
+{ case CalcLexer.NUMBER(n) => n.value }     // n: Lexeme, n.value: Double
 
 // Structural token: discard the binding when the value is not needed
 { case CalcLexer.PLUS(_) => () }
 
 // Backtick quoting for special-character token names
-{ case CalcLexer.`\+`(_) => () }
-{ case (CalcLexer.`\(`(_), Expr(e), CalcLexer.`\)`(_)) => e }
+{ case CalcLexer.`\\+`(_) => () }
+{ case (CalcLexer.`\\(`(_), Expr(e), CalcLexer.`\\)`(_)) => e }
 ```
 
 **Pitfall:** After `CalcLexer.NUMBER(n)`, the variable `n` is a `Lexeme`, not an `Int`.
-Using `n` where an `Int` is expected is a type error. Always use `n.value` for the semantic content.
+Using `n` where a `Double` is expected is a type error. Always use `n.value` for the semantic content.
 
 ## Non-Terminal Extractors
 
@@ -59,12 +59,12 @@ import alpaca.*
 val Name:  Rule[String] = rule:
   case CalcLexer.ID(id) => id.value
 
-val Value: Rule[Int] = rule:
+val Value: Rule[Double] = rule:
   case CalcLexer.NUMBER(n) => n.value
 
 val root = rule:
   case (Name(key), CalcLexer.ASSIGN(_), Value(v)) => (key, v)
-  // key: String (from Rule[String]), v: Int (from Rule[Int])
+  // key: String (from Rule[String]), v: Double (from Rule[Double])
 ```
 
 ## Tuple Patterns (Multi-Symbol Productions)

diff --git a/docs/_docs/lexer.md b/docs/_docs/lexer.md
@@ -264,3 +264,5 @@ val IndentLexer = lexer[IndentCtx]:
 ```
 
 See [Lexer Context](lexer-context.html) for full details on custom contexts, and [Between Stages](between-stages.html) for how tokenized output flows into the parser.
+
+See [Debug Settings](debug-settings.html) for compile-time debug output, log levels, and timeout configuration.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -264,3 +264,5 @@ val IndentLexer = lexer[IndentCtx]:
		```

		See [Lexer Context](lexer-context.html) for full details on custom contexts, and [Between Stages](between-stages.html) for how tokenized output flows into the parser.

		See [Debug Settings](debug-settings.html) for compile-time debug output, log levels, and timeout configuration.