Skip to content

Merge ignored-tokens-vals into token-definition for literals support#308

Open
halotukozak wants to merge 1 commit intomasterfrom
literals-support
Open

Merge ignored-tokens-vals into token-definition for literals support#308
halotukozak wants to merge 1 commit intomasterfrom
literals-support

Conversation

@halotukozak
Copy link
Copy Markdown
Owner

Summary

  • Merge ignored-tokens-vals branch into token-definition for literals support

🤖 Generated with Claude Code

# Conflicts:
#	src/alpaca/internal/lexer/Lexer.scala
Copilot AI review requested due to automatic review settings March 4, 2026 14:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new compile-time-only “token definition” type used by the lexer DSL (including a dedicated type for ignored tokens) and updates the lexer macro implementation to consume these definitions, with the stated goal of improving literals support.

Changes:

  • Change the lexer DSL constructors (Token.apply / Token.Ignored) to return TokenDefinition / IgnoredTokenDefinition instead of runtime Token types.
  • Add top-level LexerDefinition, TokenDefinition, and IgnoredTokenDefinition types in src/alpaca/lexer.scala.
  • Update the macro in internal/lexer/Lexer.scala to interpret TokenDefinition expressions and (attempt to) separate ignored-token field generation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/alpaca/lexer.scala Introduces TokenDefinition / IgnoredTokenDefinition opaque types and updates DSL constructor return types.
src/alpaca/internal/lexer/Lexer.scala Updates macro pattern-matching/typing to work with TokenDefinition and adjusts generated members for ignored tokens.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 54 to 65
@@ -61,7 +61,7 @@ object Token:
* @return a token definition
*/
@compileTimeOnly("Should never be called outside the lexer definition")
def apply[Name <: ValidName](using ctx: LexerCtx): Token[Name, ctx.type, String] = dummy
def apply[Name <: ValidName](using ctx: LexerCtx): TokenDefinition[Name, ctx.type, String] = dummy

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token.apply[Name](using ctx) is documented as “captures the matched string”, but the macro expansion in internal/lexer/Lexer.scala generates a DefinedToken[..., Unit] with remapping _ => () for this form (and tests expect Unit for tokens like dotAdd). Either update this Scaladoc to reflect the actual Unit value semantics, or change the macro/token definition so this overload really yields String.

Copilot uses AI. Check for mistakes.
List(TypeRepr.of[Tokenization[Ctx]]),
cls =>
tokenSymbols(cls) ++ List(
definedTokenSymbols(cls) ++ ignoredTokenSymbols(cls) ++ List(
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definedTokenSymbols(cls) is referenced when creating the anonymous Tokenization class members, but there is no definedTokenSymbols method/val in this file (the existing helper is tokenSymbols). As-is this won’t compile; either rename/introduce definedTokenSymbols (and update the downstream tokenSymbols(cls) usage accordingly) or revert the constructor member list to use the existing helper.

Suggested change
definedTokenSymbols(cls) ++ ignoredTokenSymbols(cls) ++ List(
tokenSymbols(cls) ++ ignoredTokenSymbols(cls) ++ List(

Copilot uses AI. Check for mistakes.
Comment on lines 145 to 151
def tokensSymbol(cls: Symbol) = Symbol.newVal(
parent = cls,
name = "tokens",
tpe = TypeRepr.of[List[ThisToken]],
tpe = TypeRepr.of[List[Token[?, Ctx, ?]]],
flags = Flags.Synthetic | Flags.Override,
privateWithin = Symbol.noSymbol,
)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokensSymbol field type was updated to List[Token[?, Ctx, ?]], and the ThisToken alias was removed earlier, but the tokens val initializer still casts each field ref with .asExprOf[ThisToken] (see ThisToken usage later in this file). This will fail to compile; update the cast to the new element type (or reintroduce a consistent alias) so the RHS matches the updated tokens type.

Copilot uses AI. Check for mistakes.

compileNameAndPattern[t](tree).map:
case '{ type name <: ValidName; Token[name]($value: value)(using $ctx) }
if Option(tree).collect { case term: Term if TypeRepr.of[value] =:= term.tpe => }.isDefined =>
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special-case branch for Token[name](...) returning _.lastRawMatched is guarded only by TypeRepr.of[value] =:= term.tpe. For patterns bound as x @ ..., term.tpe is typically String, so any String-typed value expression (even an unrelated val y: String) will match this branch and silently ignore the provided expression. Tighten the guard to ensure the argument actually refers to the case binder (e.g., compare symbols / require the argument is the same Ident as the bind), otherwise constant/external String values will be tokenized incorrectly.

Suggested change
if Option(tree).collect { case term: Term if TypeRepr.of[value] =:= term.tpe => }.isDefined =>
if
(tree, value.asTerm) match
case (bind: Bind, id: Ident) => id.symbol == bind.symbol
case (id1: Ident, id2: Ident) => id2.symbol == id1.symbol
case _ => false
=>

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants