-
Notifications
You must be signed in to change notification settings - Fork 1
Add theory application pages (conflicts, semantic actions, full example) #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
halotukozak
wants to merge
6
commits into
grammar-theory
Choose a base branch
from
theory-conflicts-page
base: grammar-theory
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
a869eea
feat(10-01): write theory/conflicts.md — parse table conflicts page
halotukozak a0c1845
feat(10-03): write theory/full-example.md capstone narrative page
halotukozak aa2ef2a
feat(10-02): write theory/semantic-actions.md
halotukozak fd11eeb
feat(11-01): add Compiler Theory nested section to sidebar.yml
halotukozak a472040
fix(11-02): fix 6 broken cross-link paths in theory pages
halotukozak 5b1aedb
Merge pull request #264 from halotukozak/theory-integration
halotukozak File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| A grammar is ambiguous if a string can be parsed in more than one way. In LR parsing, ambiguity manifests as a conflict: the parse table has two valid entries for the same (state, symbol) pair, and the parser cannot proceed deterministically. | ||
|
|
||
| ## What is a Parse Table Conflict? | ||
|
|
||
| The LR(1) parse table maps (state, lookahead terminal) pairs to actions — either Shift (push the next token) or Reduce (pop a production's right-hand side and produce a non-terminal). A conflict exists when a single (state, terminal) pair has more than one valid action: the parse table has a collision. | ||
|
|
||
| > **Definition — Parse Table Conflict:** | ||
| > A conflict in parse state s exists when the parse table has more than one entry | ||
| > for the pair (s, t) for some lookahead terminal t ∈ Σ ∪ {$}. | ||
| > A shift/reduce conflict has one entry Shift(s') and one entry Reduce(A → α). | ||
| > A reduce/reduce conflict has two entries Reduce(A → α) and Reduce(B → β). | ||
|
|
||
| ## Shift/Reduce Conflicts | ||
|
|
||
| At some parse state, given lookahead token t, the parser could either shift t (push it and move to a new state) or reduce by some production A → α (pop the right-hand side and produce A). Both are valid actions for the same (state, t) pair — the parser cannot decide between them deterministically. | ||
|
|
||
| Why it happens: two or more LR(1) items in the same state propose incompatible actions for the same lookahead. The grammar allows the same prefix to continue in two different ways, and the LR automaton sees both paths simultaneously. | ||
|
|
||
| **Example: `1 + 2 + 3` in the calculator grammar.** After parsing `Expr PLUS Expr` with lookahead `PLUS`, the parser has two valid choices: | ||
|
|
||
| - **Reduce** `Expr → Expr PLUS Expr` — complete the first addition and produce a single `Expr`. | ||
| - **Shift** the second `PLUS` — keep accumulating, treating the input as `1 + (2 + 3)`. | ||
|
|
||
| Both are valid parse trees for `1 + 2 + 3` — the grammar (from [cfg.md](cfg.md)) is ambiguous for binary operator chains. Alpaca detects this conflict at compile time and reports: | ||
|
|
||
| ``` | ||
| Shift "PLUS ($plus)" vs Reduce Expr -> Expr PLUS ($plus) Expr | ||
| In situation like: | ||
| Expr PLUS ($plus) Expr PLUS ($plus) ... | ||
| Consider marking production Expr -> Expr PLUS ($plus) Expr to be alwaysBefore or alwaysAfter "PLUS ($plus)" | ||
| ``` | ||
|
|
||
| > **Note:** The error message says `alwaysBefore`/`alwaysAfter`. These method names do not exist in the Alpaca API. The correct methods are `before` and `after`. See [Conflict Resolution](../conflict-resolution.md) for full details on reading error messages. | ||
|
|
||
| ## Reduce/Reduce Conflicts | ||
|
|
||
| A reduce/reduce conflict occurs when two different productions can reduce the same token sequence with the same lookahead. The parser has two Reduce entries for the same (state, t) pair and cannot decide which to apply. | ||
|
|
||
| **Example:** if a grammar has both `Integer → NUMBER` and `Float → NUMBER`, and the parser has `NUMBER` on the stack with lookahead `$`, it cannot determine which reduction to apply — both are valid. Alpaca reports: | ||
|
|
||
| ``` | ||
| Reduce Integer -> Number vs Reduce Float -> Number | ||
| In situation like: | ||
| Number ... | ||
| Consider marking one of the productions to be alwaysBefore or alwaysAfter the other | ||
| ``` | ||
|
|
||
| Reduce/reduce conflicts are less common than shift/reduce conflicts. They typically indicate a grammar design issue — two rules competing for the same token sequence. The usual fix is to restructure the grammar so the two competing productions have distinct right-hand sides, or to use a different non-terminal. | ||
|
|
||
| ## How LR(1) Lookahead Helps | ||
|
|
||
| LR(1) lookahead often disambiguates conflicts that earlier LR variants (LR(0), SLR) cannot resolve. Each item in the LR(1) item set carries its specific lookahead terminal, so the parser only fires a reduce when the actual next token matches that item's lookahead. This eliminates many spurious conflicts. | ||
|
|
||
| But for inherently ambiguous grammars — like the calculator's binary operator productions — LR(1) lookahead alone is not enough. The grammar has the same prefix structure regardless of which associativity is intended, so both shift and reduce appear valid to the automaton. Explicit resolution is required. | ||
|
|
||
| For a detailed explanation of items and lookahead, see [Shift-Reduce Parsing](shift-reduce.md). | ||
|
|
||
| ## Resolution by Priority | ||
|
|
||
| Resolving a conflict means declaring which action wins. For a shift/reduce conflict: should the reduction or the shift take priority? | ||
|
|
||
| Alpaca's `before`/`after` DSL lets you declare priorities directly in the parser definition: | ||
|
|
||
| - `production.name.before(tokens*)` — when the conflict is between reducing `name` and shifting one of those tokens, the reduction wins. Use this for left-associativity and higher-precedence reductions. | ||
| - `production.name.after(tokens*)` — prefer shifting those tokens over reducing this production. Use this when another operator should bind more tightly. | ||
|
|
||
| Priorities are transitive via BFS: if reducing `times` beats shifting `PLUS`, and reducing `plus` beats shifting `MINUS`, then the precedence relationships propagate through the graph. | ||
|
|
||
| A minimal example — declaring left-associativity and precedence for the `plus` production only: | ||
|
|
||
| ```scala sc:nocompile | ||
| import alpaca.* | ||
|
|
||
| override val resolutions = Set( | ||
| production.plus.before(CalcLexer.PLUS, CalcLexer.MINUS), // left-associative: reduce + before shifting + or - | ||
| production.plus.after(CalcLexer.TIMES, CalcLexer.DIVIDE), // lower precedence: shift * or / before reducing + | ||
| ) | ||
| ``` | ||
|
|
||
| The complete CalcParser resolution set — including `minus`, `times`, and `div` — is shown on [Full Calculator Example](full-example.md). For the full DSL reference (Production(symbols*) selector, token-side resolution, cycle detection, ordering constraint), see [Conflict Resolution](../conflict-resolution.md). | ||
|
|
||
| ## Compile-Time Detection | ||
|
|
||
| Conflicts are detected at compile time when the LR(1) parse table is constructed by the `extends Parser` macro. A conflict causes a compile error (`ShiftReduceConflict` or `ReduceReduceConflict`) — no conflict checking happens at runtime. | ||
|
|
||
| When you add `override val resolutions = Set(...)`, the macro incorporates your priority declarations into the table construction and re-checks for consistency. A cycle in your declarations (`InconsistentConflictResolution`) is also reported at compile time. | ||
|
|
||
| > **Compile-time processing:** Alpaca builds the LR(1) parse table when you define `object MyParser extends Parser`. Any conflict — shift/reduce or reduce/reduce — is reported as a compile error immediately, before your code runs. When you add `override val resolutions = Set(...)`, the macro incorporates your priority declarations into the table construction and re-checks for consistency. | ||
|
|
||
| ## Cross-links | ||
|
|
||
| - [Context-Free Grammars](cfg.md) — the calculator grammar that produces these conflicts | ||
| - [Shift-Reduce Parsing](shift-reduce.md) — the parse table mechanics behind conflicts | ||
| - [Conflict Resolution](../conflict-resolution.md) — the full DSL reference: Production(symbols*) selector, named productions, token-side resolution, cycle detection, ordering constraint | ||
| - [Semantic Actions](semantic-actions.md) — what happens when a conflict-free reduction fires | ||
| - [Full Calculator Example](full-example.md) — the full CalcParser with conflict resolution applied | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,185 @@ | ||
| The preceding theory pages have built up each component of the compiler pipeline: tokens and lexical analysis, context-free grammars, LR parsing mechanics, conflict resolution, and semantic actions. This page assembles all the pieces into a working arithmetic calculator — the same grammar used throughout the tutorial, now fully resolved and evaluating. Follow the steps below from grammar definition to the evaluated result `7.0`. | ||
|
|
||
| ## Step 1: The Lexer | ||
|
|
||
| CalcLexer tokenizes arithmetic expressions into the seven token classes introduced in [Tokens and Lexemes](tokens.md). | ||
|
|
||
| ```scala sc:nocompile | ||
| import alpaca.* | ||
|
|
||
| val CalcLexer = lexer: | ||
| case num @ "[0-9]+(\\.[0-9]+)?" => Token["NUMBER"](num.toDouble) | ||
| case "\\+" => Token["PLUS"] | ||
| case "-" => Token["MINUS"] | ||
| case "\\*" => Token["TIMES"] | ||
| case "/" => Token["DIVIDE"] | ||
| case "\\(" => Token["LPAREN"] | ||
| case "\\)" => Token["RPAREN"] | ||
| case "\\s+" => Token.Ignored | ||
| ``` | ||
|
|
||
| ## Step 2: The Grammar | ||
|
|
||
| The calculator grammar (from [Context-Free Grammars](cfg.md)) defines arithmetic expressions with four binary operators and parentheses: | ||
|
|
||
| ``` | ||
| Expr → Expr PLUS Expr | ||
| | Expr MINUS Expr | ||
| | Expr TIMES Expr | ||
| | Expr DIVIDE Expr | ||
| | LPAREN Expr RPAREN | ||
| | NUMBER | ||
|
|
||
| root → Expr | ||
| ``` | ||
|
|
||
| This grammar is ambiguous — the expression `1 + 2 * 3` can be parsed in two ways depending on which `Expr` is expanded first (see [Context-Free Grammars](cfg.md) for the parse tree, and [Conflicts & Disambiguation](conflicts.md) for the theory). Mapping it directly to Alpaca without conflict resolution causes a compile error. | ||
|
|
||
| ## Step 3: The First Attempt — Compile Error | ||
|
|
||
| The bare CalcParser definition — grammar productions with semantic actions but no conflict resolution — triggers a compile error: | ||
|
|
||
| ```scala sc:nocompile | ||
| import alpaca.* | ||
|
|
||
| object CalcParser extends Parser: | ||
| val Expr: Rule[Double] = rule( | ||
| "plus" { case (Expr(a), CalcLexer.PLUS(_), Expr(b)) => a + b }, | ||
| "minus" { case (Expr(a), CalcLexer.MINUS(_), Expr(b)) => a - b }, | ||
| "times" { case (Expr(a), CalcLexer.TIMES(_), Expr(b)) => a * b }, | ||
| "div" { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b }, | ||
| { case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e }, | ||
| { case CalcLexer.NUMBER(n) => n.value }, | ||
| ) | ||
| val root: Rule[Double] = rule: | ||
| case Expr(v) => v | ||
| // ↑ Compile error: ShiftReduceConflict | ||
| ``` | ||
|
|
||
| The compile error message: | ||
|
|
||
| ``` | ||
| Shift "PLUS ($plus)" vs Reduce Expr -> Expr PLUS ($plus) Expr | ||
| In situation like: | ||
| Expr PLUS ($plus) Expr PLUS ($plus) ... | ||
| Consider marking production Expr -> Expr PLUS ($plus) Expr to be alwaysBefore or alwaysAfter "PLUS ($plus)" | ||
| ``` | ||
|
|
||
| The parser does not know whether `1 + 2 + 3` should reduce `1 + 2` first (left-associative) or shift the second `+` first. This is a shift/reduce conflict — both actions are valid for the same parse state and lookahead. See [Conflicts & Disambiguation](conflicts.md) for the formal theory. | ||
|
|
||
| The error message says `alwaysBefore`/`alwaysAfter` — the correct API methods are `before` and `after` (see [Conflict Resolution](../conflict-resolution.md)). | ||
|
|
||
| ## Step 4: Adding Conflict Resolution | ||
|
|
||
| Adding `override val resolutions` declares which action wins in each conflict state. The full resolution set for the calculator encodes standard BODMAS precedence (`*` and `/` before `+` and `-`) and left-associativity for all four operators: | ||
|
|
||
| ```scala sc:nocompile | ||
| import alpaca.* | ||
|
|
||
| object CalcParser extends Parser: | ||
| val Expr: Rule[Double] = rule( | ||
| "plus" { case (Expr(a), CalcLexer.PLUS(_), Expr(b)) => a + b }, | ||
| "minus" { case (Expr(a), CalcLexer.MINUS(_), Expr(b)) => a - b }, | ||
| "times" { case (Expr(a), CalcLexer.TIMES(_), Expr(b)) => a * b }, | ||
| "div" { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b }, | ||
| { case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e }, | ||
| { case CalcLexer.NUMBER(n) => n.value }, | ||
| ) | ||
| val root: Rule[Double] = rule: | ||
| case Expr(v) => v | ||
|
|
||
| override val resolutions = Set( | ||
| // + and - are left-associative with equal precedence | ||
| production.plus.before(CalcLexer.PLUS, CalcLexer.MINUS), | ||
| production.plus.after(CalcLexer.TIMES, CalcLexer.DIVIDE), | ||
| production.minus.before(CalcLexer.PLUS, CalcLexer.MINUS), | ||
| production.minus.after(CalcLexer.TIMES, CalcLexer.DIVIDE), | ||
| // * and / are left-associative; bind tighter than + and - | ||
| production.times.before(CalcLexer.TIMES, CalcLexer.DIVIDE, CalcLexer.PLUS, CalcLexer.MINUS), | ||
| production.div.before(CalcLexer.TIMES, CalcLexer.DIVIDE, CalcLexer.PLUS, CalcLexer.MINUS), | ||
| ) | ||
| ``` | ||
|
|
||
| Key decisions in the resolution set: | ||
|
|
||
| - `production.plus.before(PLUS, MINUS)` — after reducing `a + b`, do not shift another `+` or `-`. This gives `+` left-associativity: `1 + 2 + 3` = `(1 + 2) + 3`. | ||
| - `production.plus.after(TIMES, DIVIDE)` — prefer shifting `*` or `/` over reducing `+`. This gives `*`/`/` higher precedence: `1 + 2 * 3` shifts `*` before completing `1 + ...`. | ||
| - `production.times.before(TIMES, DIVIDE, PLUS, MINUS)` — after reducing `a * b`, do not shift any operator. `*` and `/` bind tightest. | ||
|
|
||
| For the full conflict resolution DSL — including `Production(symbols*)` selector, token-side resolution, cycle detection, and the ordering constraint — see [Conflict Resolution](../conflict-resolution.md). | ||
|
|
||
| ## Step 5: Running the Calculator | ||
|
|
||
| With conflict resolution in place, the compiler builds the LR(1) parse table without errors. The parser is ready: | ||
|
|
||
| ```scala sc:nocompile | ||
| val (_, lexemes) = CalcLexer.tokenize("1 + 2 * 3") | ||
| val (_, result) = CalcParser.parse(lexemes) | ||
| // result: Double | Null = 7.0 (not 9.0 — * binds tighter than +) | ||
|
|
||
| val (_, l2) = CalcLexer.tokenize("(1 + 2) * 3") | ||
| val (_, r2) = CalcParser.parse(l2) | ||
| // r2: Double | Null = 9.0 (parentheses override precedence) | ||
|
|
||
| // Always check for null before using result: | ||
| if result != null then println(result) | ||
| ``` | ||
|
|
||
| `1 + 2 * 3 = 7.0` (not 9.0) confirms that the `times`/`div` resolutions give `*` higher precedence than `+`. Parentheses `(1 + 2) * 3 = 9.0` override precedence as expected. Always check `result != null` before using the value — `null` indicates a parse failure (input not matched by the grammar); see [Parser](../parser.md). | ||
|
|
||
| ## Step 6: Semantic Action Trace | ||
|
|
||
| To see how `1 + 2 * 3 = 7.0` is computed, trace the semantic actions fired during the parse: | ||
|
|
||
| ``` | ||
| Reduce NUMBER(1.0) → Expr(1.0) action: n.value = 1.0 | ||
| Shift PLUS | ||
| Reduce NUMBER(2.0) → Expr(2.0) action: n.value = 2.0 | ||
| Shift TIMES | ||
| Reduce NUMBER(3.0) → Expr(3.0) action: n.value = 3.0 | ||
| Reduce Expr(2.0) TIMES Expr(3.0) → Expr(6.0) action: a * b = 6.0 | ||
| Reduce Expr(1.0) PLUS Expr(6.0) → Expr(7.0) action: a + b = 7.0 | ||
| Reduce root → Expr(7.0) result: 7.0 | ||
| ``` | ||
|
|
||
| The `times` conflict resolution caused the parser to reduce `2 * 3` before completing `1 + ...`. Each reduce step calls the corresponding semantic action immediately — no parse tree object is ever constructed (see [Semantic Actions](semantic-actions.md)). The typed `Double` result propagates upward at each step. | ||
|
|
||
| ## Formal Definition | ||
|
|
||
| > **Definition — Syntax-Directed Calculator:** | ||
| > A syntax-directed calculator is a grammar G = (V, Σ, R, S) together with | ||
| > a conflict resolution order ≺ on R and Σ, and semantic actions fᵣ for each r ∈ R. | ||
| > The parser reduces deterministically by the action preferred under ≺, | ||
| > and each fᵣ maps the Double values of the right-hand side symbols to a Double. | ||
| > The value of the start symbol is the arithmetic result. | ||
|
|
||
| ## What Compile Time Does | ||
|
|
||
| > **Compile-time processing:** Every part of CalcParser shown above is processed at compile time. The `lexer` macro compiles the token patterns and generates the tokenizer. The `extends Parser` macro reads the `rule` declarations, builds the complete LR(1) parse table, incorporates the `resolutions` priority rules, and reports any conflicts immediately. At runtime, `tokenize()` and `parse()` execute the pre-built tables — no grammar analysis happens at runtime. | ||
|
|
||
| ## Theory to Code | ||
|
|
||
| Each piece of the CalcParser traces back to a theory concept: | ||
|
|
||
| | What you wrote | Theory behind it | | ||
| |---|---| | ||
| | `val CalcLexer = lexer:` | Lexical analysis, regex → NFA → DFA — see [The Lexer: Regex to Finite Automata](lexer-fa.md) | | ||
| | BNF grammar in `rule(...)` | Context-free grammars — see [Context-Free Grammars](cfg.md) | | ||
| | `extends Parser` generates LR(1) table | LR parse table construction — see [Why LR?](why-lr.md) | | ||
| | Shift/reduce loop | LR parse mechanics — see [Shift-Reduce Parsing](shift-reduce.md) | | ||
| | `ShiftReduceConflict` compile error | Grammar ambiguity — see [Conflicts & Disambiguation](conflicts.md) | | ||
| | `override val resolutions = Set(...)` | Conflict resolution — see [Conflict Resolution](../conflict-resolution.md) | | ||
| | `case (Expr(a), ...) => a + b` | Semantic actions — see [Semantic Actions](semantic-actions.md) | | ||
| | `parse()` returns `7.0: Double` | Typed results via S-attributed translation | | ||
|
|
||
| ## Cross-links | ||
|
|
||
| - [Tokens and Lexemes](tokens.md) — CalcLexer's token definitions | ||
| - [Context-Free Grammars](cfg.md) — the calculator grammar and parse trees | ||
| - [Why LR?](why-lr.md) — why LR(1) was chosen over LL alternatives | ||
| - [Shift-Reduce Parsing](shift-reduce.md) — the shift/reduce loop step by step | ||
| - [Conflicts & Disambiguation](conflicts.md) — why conflicts arise and the priority model | ||
| - [Semantic Actions](semantic-actions.md) — how typed values are computed during reduce | ||
| - [Conflict Resolution](../conflict-resolution.md) — the complete `before`/`after` DSL reference | ||
| - [Parser](../parser.md) — the `rule` DSL and `parse()` usage | ||
| - [Extractors](../extractors.md) — all extractor forms for terminals and non-terminals |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The terminal symbol should be "NUMBER" (all caps) not "Number" to match the terminal naming convention consistently used throughout the documentation. The example productions should be "Integer → NUMBER" and "Float → NUMBER", and the error message should display "Reduce Integer -> NUMBER vs Reduce Float -> NUMBER".