Merge ignored-tokens-vals into token-definition for literals support#308
Merge ignored-tokens-vals into token-definition for literals support#308halotukozak wants to merge 1 commit intomasterfrom
Conversation
# Conflicts: # src/alpaca/internal/lexer/Lexer.scala
There was a problem hiding this comment.
Pull request overview
This PR introduces a new compile-time-only “token definition” type used by the lexer DSL (including a dedicated type for ignored tokens) and updates the lexer macro implementation to consume these definitions, with the stated goal of improving literals support.
Changes:
- Change the lexer DSL constructors (
Token.apply/Token.Ignored) to returnTokenDefinition/IgnoredTokenDefinitioninstead of runtimeTokentypes. - Add top-level
LexerDefinition,TokenDefinition, andIgnoredTokenDefinitiontypes insrc/alpaca/lexer.scala. - Update the macro in
internal/lexer/Lexer.scalato interpretTokenDefinitionexpressions and (attempt to) separate ignored-token field generation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/alpaca/lexer.scala |
Introduces TokenDefinition / IgnoredTokenDefinition opaque types and updates DSL constructor return types. |
src/alpaca/internal/lexer/Lexer.scala |
Updates macro pattern-matching/typing to work with TokenDefinition and adjusts generated members for ignored tokens. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -61,7 +61,7 @@ object Token: | |||
| * @return a token definition | |||
| */ | |||
| @compileTimeOnly("Should never be called outside the lexer definition") | |||
| def apply[Name <: ValidName](using ctx: LexerCtx): Token[Name, ctx.type, String] = dummy | |||
| def apply[Name <: ValidName](using ctx: LexerCtx): TokenDefinition[Name, ctx.type, String] = dummy | |||
|
|
|||
There was a problem hiding this comment.
Token.apply[Name](using ctx) is documented as “captures the matched string”, but the macro expansion in internal/lexer/Lexer.scala generates a DefinedToken[..., Unit] with remapping _ => () for this form (and tests expect Unit for tokens like dotAdd). Either update this Scaladoc to reflect the actual Unit value semantics, or change the macro/token definition so this overload really yields String.
| List(TypeRepr.of[Tokenization[Ctx]]), | ||
| cls => | ||
| tokenSymbols(cls) ++ List( | ||
| definedTokenSymbols(cls) ++ ignoredTokenSymbols(cls) ++ List( |
There was a problem hiding this comment.
definedTokenSymbols(cls) is referenced when creating the anonymous Tokenization class members, but there is no definedTokenSymbols method/val in this file (the existing helper is tokenSymbols). As-is this won’t compile; either rename/introduce definedTokenSymbols (and update the downstream tokenSymbols(cls) usage accordingly) or revert the constructor member list to use the existing helper.
| definedTokenSymbols(cls) ++ ignoredTokenSymbols(cls) ++ List( | |
| tokenSymbols(cls) ++ ignoredTokenSymbols(cls) ++ List( |
| def tokensSymbol(cls: Symbol) = Symbol.newVal( | ||
| parent = cls, | ||
| name = "tokens", | ||
| tpe = TypeRepr.of[List[ThisToken]], | ||
| tpe = TypeRepr.of[List[Token[?, Ctx, ?]]], | ||
| flags = Flags.Synthetic | Flags.Override, | ||
| privateWithin = Symbol.noSymbol, | ||
| ) |
There was a problem hiding this comment.
The tokensSymbol field type was updated to List[Token[?, Ctx, ?]], and the ThisToken alias was removed earlier, but the tokens val initializer still casts each field ref with .asExprOf[ThisToken] (see ThisToken usage later in this file). This will fail to compile; update the cast to the new element type (or reintroduce a consistent alias) so the RHS matches the updated tokens type.
|
|
||
| compileNameAndPattern[t](tree).map: | ||
| case '{ type name <: ValidName; Token[name]($value: value)(using $ctx) } | ||
| if Option(tree).collect { case term: Term if TypeRepr.of[value] =:= term.tpe => }.isDefined => |
There was a problem hiding this comment.
The special-case branch for Token[name](...) returning _.lastRawMatched is guarded only by TypeRepr.of[value] =:= term.tpe. For patterns bound as x @ ..., term.tpe is typically String, so any String-typed value expression (even an unrelated val y: String) will match this branch and silently ignore the provided expression. Tighten the guard to ensure the argument actually refers to the case binder (e.g., compare symbols / require the argument is the same Ident as the bind), otherwise constant/external String values will be tokenized incorrectly.
| if Option(tree).collect { case term: Term if TypeRepr.of[value] =:= term.tpe => }.isDefined => | |
| if | |
| (tree, value.asTerm) match | |
| case (bind: Bind, id: Ident) => id.symbol == bind.symbol | |
| case (id1: Ident, id2: Ident) => id2.symbol == id1.symbol | |
| case _ => false | |
| => |
Summary
🤖 Generated with Claude Code