Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions docs/_docs/theory/conflicts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
A grammar is ambiguous if a string can be parsed in more than one way. In LR parsing, ambiguity manifests as a conflict: the parse table has two valid entries for the same (state, symbol) pair, and the parser cannot proceed deterministically.

## What is a Parse Table Conflict?

The LR(1) parse table maps (state, lookahead terminal) pairs to actions — either Shift (push the next token) or Reduce (pop a production's right-hand side and produce a non-terminal). A conflict exists when a single (state, terminal) pair has more than one valid action: the parse table has a collision.

> **Definition — Parse Table Conflict:**
> A conflict in parse state s exists when the parse table has more than one entry
> for the pair (s, t) for some lookahead terminal t ∈ Σ ∪ {$}.
> A shift/reduce conflict has one entry Shift(s') and one entry Reduce(A → α).
> A reduce/reduce conflict has two entries Reduce(A → α) and Reduce(B → β).

## Shift/Reduce Conflicts

At some parse state, given lookahead token t, the parser could either shift t (push it and move to a new state) or reduce by some production A → α (pop the right-hand side and produce A). Both are valid actions for the same (state, t) pair — the parser cannot decide between them deterministically.

Why it happens: two or more LR(1) items in the same state propose incompatible actions for the same lookahead. The grammar allows the same prefix to continue in two different ways, and the LR automaton sees both paths simultaneously.

**Example: `1 + 2 + 3` in the calculator grammar.** After parsing `Expr PLUS Expr` with lookahead `PLUS`, the parser has two valid choices:

- **Reduce** `Expr → Expr PLUS Expr` — complete the first addition and produce a single `Expr`.
- **Shift** the second `PLUS` — keep accumulating, treating the input as `1 + (2 + 3)`.

Both are valid parse trees for `1 + 2 + 3` — the grammar (from [cfg.md](cfg.md)) is ambiguous for binary operator chains. Alpaca detects this conflict at compile time and reports:

```
Shift "PLUS ($plus)" vs Reduce Expr -> Expr PLUS ($plus) Expr
In situation like:
Expr PLUS ($plus) Expr PLUS ($plus) ...
Consider marking production Expr -> Expr PLUS ($plus) Expr to be alwaysBefore or alwaysAfter "PLUS ($plus)"
```

> **Note:** The error message says `alwaysBefore`/`alwaysAfter`. These method names do not exist in the Alpaca API. The correct methods are `before` and `after`. See [Conflict Resolution](../conflict-resolution.md) for full details on reading error messages.

## Reduce/Reduce Conflicts

A reduce/reduce conflict occurs when two different productions can reduce the same token sequence with the same lookahead. The parser has two Reduce entries for the same (state, t) pair and cannot decide which to apply.

**Example:** if a grammar has both `Integer → NUMBER` and `Float → NUMBER`, and the parser has `NUMBER` on the stack with lookahead `$`, it cannot determine which reduction to apply — both are valid. Alpaca reports:

```
Reduce Integer -> Number vs Reduce Float -> Number
In situation like:
Number ...
Comment on lines +42 to +44
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The terminal symbol should be "NUMBER" (all caps) not "Number" to match the terminal naming convention consistently used throughout the documentation. The example productions should be "Integer → NUMBER" and "Float → NUMBER", and the error message should display "Reduce Integer -> NUMBER vs Reduce Float -> NUMBER".

Suggested change
Reduce Integer -> Number vs Reduce Float -> Number
In situation like:
Number ...
Reduce Integer -> NUMBER vs Reduce Float -> NUMBER
In situation like:
NUMBER ...

Copilot uses AI. Check for mistakes.
Consider marking one of the productions to be alwaysBefore or alwaysAfter the other
```

Reduce/reduce conflicts are less common than shift/reduce conflicts. They typically indicate a grammar design issue — two rules competing for the same token sequence. The usual fix is to restructure the grammar so the two competing productions have distinct right-hand sides, or to use a different non-terminal.

## How LR(1) Lookahead Helps

LR(1) lookahead often disambiguates conflicts that earlier LR variants (LR(0), SLR) cannot resolve. Each item in the LR(1) item set carries its specific lookahead terminal, so the parser only fires a reduce when the actual next token matches that item's lookahead. This eliminates many spurious conflicts.

But for inherently ambiguous grammars — like the calculator's binary operator productions — LR(1) lookahead alone is not enough. The grammar has the same prefix structure regardless of which associativity is intended, so both shift and reduce appear valid to the automaton. Explicit resolution is required.

For a detailed explanation of items and lookahead, see [Shift-Reduce Parsing](shift-reduce.md).

## Resolution by Priority

Resolving a conflict means declaring which action wins. For a shift/reduce conflict: should the reduction or the shift take priority?

Alpaca's `before`/`after` DSL lets you declare priorities directly in the parser definition:

- `production.name.before(tokens*)` — when the conflict is between reducing `name` and shifting one of those tokens, the reduction wins. Use this for left-associativity and higher-precedence reductions.
- `production.name.after(tokens*)` — prefer shifting those tokens over reducing this production. Use this when another operator should bind more tightly.

Priorities are transitive via BFS: if reducing `times` beats shifting `PLUS`, and reducing `plus` beats shifting `MINUS`, then the precedence relationships propagate through the graph.

A minimal example — declaring left-associativity and precedence for the `plus` production only:

```scala sc:nocompile
import alpaca.*

override val resolutions = Set(
production.plus.before(CalcLexer.PLUS, CalcLexer.MINUS), // left-associative: reduce + before shifting + or -
production.plus.after(CalcLexer.TIMES, CalcLexer.DIVIDE), // lower precedence: shift * or / before reducing +
)
```

The complete CalcParser resolution set — including `minus`, `times`, and `div` — is shown on [Full Calculator Example](full-example.md). For the full DSL reference (Production(symbols*) selector, token-side resolution, cycle detection, ordering constraint), see [Conflict Resolution](../conflict-resolution.md).

## Compile-Time Detection

Conflicts are detected at compile time when the LR(1) parse table is constructed by the `extends Parser` macro. A conflict causes a compile error (`ShiftReduceConflict` or `ReduceReduceConflict`) — no conflict checking happens at runtime.

When you add `override val resolutions = Set(...)`, the macro incorporates your priority declarations into the table construction and re-checks for consistency. A cycle in your declarations (`InconsistentConflictResolution`) is also reported at compile time.

> **Compile-time processing:** Alpaca builds the LR(1) parse table when you define `object MyParser extends Parser`. Any conflict — shift/reduce or reduce/reduce — is reported as a compile error immediately, before your code runs. When you add `override val resolutions = Set(...)`, the macro incorporates your priority declarations into the table construction and re-checks for consistency.

## Cross-links

- [Context-Free Grammars](cfg.md) — the calculator grammar that produces these conflicts
- [Shift-Reduce Parsing](shift-reduce.md) — the parse table mechanics behind conflicts
- [Conflict Resolution](../conflict-resolution.md) — the full DSL reference: Production(symbols*) selector, named productions, token-side resolution, cycle detection, ordering constraint
- [Semantic Actions](semantic-actions.md) — what happens when a conflict-free reduction fires
- [Full Calculator Example](full-example.md) — the full CalcParser with conflict resolution applied
185 changes: 185 additions & 0 deletions docs/_docs/theory/full-example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
The preceding theory pages have built up each component of the compiler pipeline: tokens and lexical analysis, context-free grammars, LR parsing mechanics, conflict resolution, and semantic actions. This page assembles all the pieces into a working arithmetic calculator — the same grammar used throughout the tutorial, now fully resolved and evaluating. Follow the steps below from grammar definition to the evaluated result `7.0`.

## Step 1: The Lexer

CalcLexer tokenizes arithmetic expressions into the seven token classes introduced in [Tokens and Lexemes](tokens.md).

```scala sc:nocompile
import alpaca.*

val CalcLexer = lexer:
case num @ "[0-9]+(\\.[0-9]+)?" => Token["NUMBER"](num.toDouble)
case "\\+" => Token["PLUS"]
case "-" => Token["MINUS"]
case "\\*" => Token["TIMES"]
case "/" => Token["DIVIDE"]
case "\\(" => Token["LPAREN"]
case "\\)" => Token["RPAREN"]
case "\\s+" => Token.Ignored
```

## Step 2: The Grammar

The calculator grammar (from [Context-Free Grammars](cfg.md)) defines arithmetic expressions with four binary operators and parentheses:

```
Expr → Expr PLUS Expr
| Expr MINUS Expr
| Expr TIMES Expr
| Expr DIVIDE Expr
| LPAREN Expr RPAREN
| NUMBER

root → Expr
```

This grammar is ambiguous — the expression `1 + 2 * 3` can be parsed in two ways depending on which `Expr` is expanded first (see [Context-Free Grammars](cfg.md) for the parse tree, and [Conflicts & Disambiguation](conflicts.md) for the theory). Mapping it directly to Alpaca without conflict resolution causes a compile error.

## Step 3: The First Attempt — Compile Error

The bare CalcParser definition — grammar productions with semantic actions but no conflict resolution — triggers a compile error:

```scala sc:nocompile
import alpaca.*

object CalcParser extends Parser:
val Expr: Rule[Double] = rule(
"plus" { case (Expr(a), CalcLexer.PLUS(_), Expr(b)) => a + b },
"minus" { case (Expr(a), CalcLexer.MINUS(_), Expr(b)) => a - b },
"times" { case (Expr(a), CalcLexer.TIMES(_), Expr(b)) => a * b },
"div" { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b },
{ case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },
{ case CalcLexer.NUMBER(n) => n.value },
)
val root: Rule[Double] = rule:
case Expr(v) => v
// ↑ Compile error: ShiftReduceConflict
```

The compile error message:

```
Shift "PLUS ($plus)" vs Reduce Expr -> Expr PLUS ($plus) Expr
In situation like:
Expr PLUS ($plus) Expr PLUS ($plus) ...
Consider marking production Expr -> Expr PLUS ($plus) Expr to be alwaysBefore or alwaysAfter "PLUS ($plus)"
```

The parser does not know whether `1 + 2 + 3` should reduce `1 + 2` first (left-associative) or shift the second `+` first. This is a shift/reduce conflict — both actions are valid for the same parse state and lookahead. See [Conflicts & Disambiguation](conflicts.md) for the formal theory.

The error message says `alwaysBefore`/`alwaysAfter` — the correct API methods are `before` and `after` (see [Conflict Resolution](../conflict-resolution.md)).

## Step 4: Adding Conflict Resolution

Adding `override val resolutions` declares which action wins in each conflict state. The full resolution set for the calculator encodes standard BODMAS precedence (`*` and `/` before `+` and `-`) and left-associativity for all four operators:

```scala sc:nocompile
import alpaca.*

object CalcParser extends Parser:
val Expr: Rule[Double] = rule(
"plus" { case (Expr(a), CalcLexer.PLUS(_), Expr(b)) => a + b },
"minus" { case (Expr(a), CalcLexer.MINUS(_), Expr(b)) => a - b },
"times" { case (Expr(a), CalcLexer.TIMES(_), Expr(b)) => a * b },
"div" { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b },
{ case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },
{ case CalcLexer.NUMBER(n) => n.value },
)
val root: Rule[Double] = rule:
case Expr(v) => v

override val resolutions = Set(
// + and - are left-associative with equal precedence
production.plus.before(CalcLexer.PLUS, CalcLexer.MINUS),
production.plus.after(CalcLexer.TIMES, CalcLexer.DIVIDE),
production.minus.before(CalcLexer.PLUS, CalcLexer.MINUS),
production.minus.after(CalcLexer.TIMES, CalcLexer.DIVIDE),
// * and / are left-associative; bind tighter than + and -
production.times.before(CalcLexer.TIMES, CalcLexer.DIVIDE, CalcLexer.PLUS, CalcLexer.MINUS),
production.div.before(CalcLexer.TIMES, CalcLexer.DIVIDE, CalcLexer.PLUS, CalcLexer.MINUS),
)
```

Key decisions in the resolution set:

- `production.plus.before(PLUS, MINUS)` — after reducing `a + b`, do not shift another `+` or `-`. This gives `+` left-associativity: `1 + 2 + 3` = `(1 + 2) + 3`.
- `production.plus.after(TIMES, DIVIDE)` — prefer shifting `*` or `/` over reducing `+`. This gives `*`/`/` higher precedence: `1 + 2 * 3` shifts `*` before completing `1 + ...`.
- `production.times.before(TIMES, DIVIDE, PLUS, MINUS)` — after reducing `a * b`, do not shift any operator. `*` and `/` bind tightest.

For the full conflict resolution DSL — including `Production(symbols*)` selector, token-side resolution, cycle detection, and the ordering constraint — see [Conflict Resolution](../conflict-resolution.md).

## Step 5: Running the Calculator

With conflict resolution in place, the compiler builds the LR(1) parse table without errors. The parser is ready:

```scala sc:nocompile
val (_, lexemes) = CalcLexer.tokenize("1 + 2 * 3")
val (_, result) = CalcParser.parse(lexemes)
// result: Double | Null = 7.0 (not 9.0 — * binds tighter than +)

val (_, l2) = CalcLexer.tokenize("(1 + 2) * 3")
val (_, r2) = CalcParser.parse(l2)
// r2: Double | Null = 9.0 (parentheses override precedence)

// Always check for null before using result:
if result != null then println(result)
```

`1 + 2 * 3 = 7.0` (not 9.0) confirms that the `times`/`div` resolutions give `*` higher precedence than `+`. Parentheses `(1 + 2) * 3 = 9.0` override precedence as expected. Always check `result != null` before using the value — `null` indicates a parse failure (input not matched by the grammar); see [Parser](../parser.md).

## Step 6: Semantic Action Trace

To see how `1 + 2 * 3 = 7.0` is computed, trace the semantic actions fired during the parse:

```
Reduce NUMBER(1.0) → Expr(1.0) action: n.value = 1.0
Shift PLUS
Reduce NUMBER(2.0) → Expr(2.0) action: n.value = 2.0
Shift TIMES
Reduce NUMBER(3.0) → Expr(3.0) action: n.value = 3.0
Reduce Expr(2.0) TIMES Expr(3.0) → Expr(6.0) action: a * b = 6.0
Reduce Expr(1.0) PLUS Expr(6.0) → Expr(7.0) action: a + b = 7.0
Reduce root → Expr(7.0) result: 7.0
```

The `times` conflict resolution caused the parser to reduce `2 * 3` before completing `1 + ...`. Each reduce step calls the corresponding semantic action immediately — no parse tree object is ever constructed (see [Semantic Actions](semantic-actions.md)). The typed `Double` result propagates upward at each step.

## Formal Definition

> **Definition — Syntax-Directed Calculator:**
> A syntax-directed calculator is a grammar G = (V, Σ, R, S) together with
> a conflict resolution order ≺ on R and Σ, and semantic actions fᵣ for each r ∈ R.
> The parser reduces deterministically by the action preferred under ≺,
> and each fᵣ maps the Double values of the right-hand side symbols to a Double.
> The value of the start symbol is the arithmetic result.

## What Compile Time Does

> **Compile-time processing:** Every part of CalcParser shown above is processed at compile time. The `lexer` macro compiles the token patterns and generates the tokenizer. The `extends Parser` macro reads the `rule` declarations, builds the complete LR(1) parse table, incorporates the `resolutions` priority rules, and reports any conflicts immediately. At runtime, `tokenize()` and `parse()` execute the pre-built tables — no grammar analysis happens at runtime.

## Theory to Code

Each piece of the CalcParser traces back to a theory concept:

| What you wrote | Theory behind it |
|---|---|
| `val CalcLexer = lexer:` | Lexical analysis, regex → NFA → DFA — see [The Lexer: Regex to Finite Automata](lexer-fa.md) |
| BNF grammar in `rule(...)` | Context-free grammars — see [Context-Free Grammars](cfg.md) |
| `extends Parser` generates LR(1) table | LR parse table construction — see [Why LR?](why-lr.md) |
| Shift/reduce loop | LR parse mechanics — see [Shift-Reduce Parsing](shift-reduce.md) |
| `ShiftReduceConflict` compile error | Grammar ambiguity — see [Conflicts & Disambiguation](conflicts.md) |
| `override val resolutions = Set(...)` | Conflict resolution — see [Conflict Resolution](../conflict-resolution.md) |
| `case (Expr(a), ...) => a + b` | Semantic actions — see [Semantic Actions](semantic-actions.md) |
| `parse()` returns `7.0: Double` | Typed results via S-attributed translation |

## Cross-links

- [Tokens and Lexemes](tokens.md) — CalcLexer's token definitions
- [Context-Free Grammars](cfg.md) — the calculator grammar and parse trees
- [Why LR?](why-lr.md) — why LR(1) was chosen over LL alternatives
- [Shift-Reduce Parsing](shift-reduce.md) — the shift/reduce loop step by step
- [Conflicts & Disambiguation](conflicts.md) — why conflicts arise and the priority model
- [Semantic Actions](semantic-actions.md) — how typed values are computed during reduce
- [Conflict Resolution](../conflict-resolution.md) — the complete `before`/`after` DSL reference
- [Parser](../parser.md) — the `rule` DSL and `parse()` usage
- [Extractors](../extractors.md) — all extractor forms for terminals and non-terminals
2 changes: 1 addition & 1 deletion docs/_docs/theory/lexer-fa.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,4 +109,4 @@ integer-only pattern, so no shadowing occurs.

- See [Lexer](../lexer.md) for the complete `lexer` DSL reference.
- See [Tokens and Lexemes](tokens.md) for what the lexer produces — the lexeme stream.
- Next: [Context-Free Grammars](theory/cfg.md) for how token streams are parsed.
- Next: [Context-Free Grammars](cfg.md) for how token streams are parsed.
8 changes: 4 additions & 4 deletions docs/_docs/theory/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,10 @@ The parse tree is never exposed directly — Alpaca builds it internally and imm

The rest of the Compiler Theory Tutorial builds on this mental model:

- Next: [Tokens & Lexemes](theory/tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
- [The Lexer: Regex to Finite Automata](theory/lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them
- Next: [Tokens & Lexemes](tokens.md) — what the lexer produces: token classes, token instances, and how they are represented in Alpaca
- [The Lexer: Regex to Finite Automata](lexer-fa.md) — how regular expressions define token classes and how Alpaca compiles them

For the full API, see the reference pages:

- See [Lexer](lexer.md) for how `CalcLexer` is defined.
- See [Parser](parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.
- See [Lexer](../lexer.md) for how `CalcLexer` is defined.
- See [Parser](../parser.md) for how `CalcParser` is defined and how grammar rules produce a typed result.
Loading