Conversation
…ne related usage
…oved clarity and efficiency
There was a problem hiding this comment.
Pull request overview
This PR adds support for character literals in lexer definitions, allowing developers to write case '+' => Token["PLUS"] instead of requiring regex strings like case "\\+" => Token["PLUS"]. This makes lexer definitions more concise and readable for single-character tokens.
Key changes:
- Extended lexer pattern matching to accept both
StringandCharliterals - Modified
TokenInfoto store patterns asSeq[String | Char]instead of a singleString - Added
byLiteralmap toTokenizationfor efficient character-based token lookup
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/src/alpaca/internal/parser/ParseTableTest.scala | Updated test to use character literals ('+') instead of regex strings ("\\+") |
| test/src/alpaca/internal/lexer/RegexCheckerTest.scala | Changed test assertions to match new return type (tuples instead of error strings) |
| test/src/alpaca/internal/lexer/LexerTest.scala | Replaced regex patterns with character literals and shouldBe assertions |
| test/src/alpaca/integration/MathTest.scala | Converted single-character operators to use character literal syntax |
| test/src/alpaca/integration/MainTest.scala | Applied character literal syntax for operators and parentheses |
| test/src/alpaca/integration/JsonTest.scala | Updated JSON lexer to use character literals for brackets and punctuation |
| test/src/alpaca/integration/CompilationTheoryTest.scala | Modified compilation theory test lexer with character literals |
| test/src/alpaca/ParserApiTest.scala | Updated parser test lexer to use character literals |
| test/src/alpaca/LexerApiTest.scala | Comprehensive test demonstrating mixed regex and literal patterns (contains syntax errors) |
| test/src/alpaca/CtxRemappingTest.scala | Updated context remapping tests with character literals |
| src/alpaca/lexer.scala | Added LexerDefinition, TokenDefinition, and IgnoredTokenDefinition types supporting `String |
| src/alpaca/internal/lexer/Tokenization.scala | Added byLiteral map for character-based token lookup |
| src/alpaca/internal/lexer/Token.scala | Extended TokenInfo to support `Seq[String |
| src/alpaca/internal/lexer/RegexChecker.scala | Modified to return tuple indices instead of error messages, added checkInfos method |
| src/alpaca/internal/lexer/Lexer.scala | Implemented byLiteral map generation and updated token compilation logic |
| src/alpaca/internal/lexer/CompileNameAndPattern.scala | Added support for extracting character literals and union types |
| src/alpaca/internal/internal.scala | Added changeOwner to lambda generation and moved ToExpr/FromExpr givens |
| src/alpaca/internal/debugUtils.scala | Removed redundant string extension methods |
| src/alpaca/internal/ValidName.scala | Added ValidNameLike type alias for `(Char |
| src/alpaca/internal/UnsafeExtensions.scala | Enhanced error messages with source position information |
| src/alpaca/DebugSettings.scala | Removed unused import |
| build.mill | Updated Scala version to 3.8.0-RC3 and commented out -explain flag |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def checkInfos(infos: Seq[TokenInfo[?]])(using quotes: Quotes): Unit = for | ||
| patterns = infos.map(_.toEscapedRegex) | ||
| (j, i) <- checkPatterns(patterns) | ||
| do quotes.reflect.report.error(s"Pattern ${infos(j).toRegex} is shadowed by ${infos(i).toRegex}") |
There was a problem hiding this comment.
The error message uses toRegex which doesn't escape regex special characters. This will produce confusing error messages showing raw characters like '+' or '(' instead of their escaped forms '+' or '('. Consider using a method that preserves the original pattern representation or creates a user-friendly display format that clearly shows what patterns are being matched.
| do quotes.reflect.report.error(s"Pattern ${infos(j).toRegex} is shadowed by ${infos(i).toRegex}") | |
| do quotes.reflect.report.error(s"Pattern ${infos(j).toEscapedRegex} is shadowed by ${infos(i).toEscapedRegex}") |
| val all = Expr.ofSeq { | ||
| (definedTokenVals ++ ignoredTokenVals).collect: | ||
| case valDef if valDef.name.length == 1 => | ||
| Expr.ofTuple((Expr(valDef.name.head), Ref(valDef.symbol).asExprOf[DefinedToken[?, Ctx, ?]])) |
There was a problem hiding this comment.
The byLiteral map is being populated with both defined and ignored tokens, but at line 234, both are being cast to DefinedToken[?, Ctx, ?]. However, ignoredTokenVals contains IgnoredToken instances, not DefinedToken instances. This type mismatch will cause a compilation error or runtime type cast exception. The type should be Token[?, Ctx, ?] to accommodate both token types.
| Expr.ofTuple((Expr(valDef.name.head), Ref(valDef.symbol).asExprOf[DefinedToken[?, Ctx, ?]])) | |
| Expr.ofTuple((Expr(valDef.name.head), Ref(valDef.symbol).asExprOf[Token[?, Ctx, ?]])) |
| .map: | ||
| case TokenInfo(_, regexGroupName, pattern) => s"(?<$regexGroupName>$pattern)" | ||
| .map: tokenInfo => | ||
| val regex = tokenInfo.toEscapedRegex // todo: literals should be handled separately |
There was a problem hiding this comment.
This TODO comment mentions that "literals should be handled separately", but the current implementation treats them the same as regex patterns by escaping them. Consider clarifying what the intended separate handling should be - for example, should literals be matched directly as strings before attempting regex matching for better performance, or is there another reason they should be separate?
| @@ -68,23 +68,18 @@ private[internal] final class CreateLambda[Q <: Quotes](using val quotes: Q) { | |||
| Lambda( | |||
| Symbol.spliceOwner, | |||
| MethodType(params.zipWithIndex.map((_, i) => s"$$arg$i"))(_ => params, _ => r), | |||
There was a problem hiding this comment.
The changeOwner(sym) call is added to the result of rhsFn.unsafeApply. While this may be necessary for correct symbol ownership in the macro-generated code, ensure this doesn't break existing functionality where the owner was previously not changed. If this fixes a bug, consider documenting why this change is needed.
| MethodType(params.zipWithIndex.map((_, i) => s"$$arg$i"))(_ => params, _ => r), | |
| MethodType(params.zipWithIndex.map((_, i) => s"$$arg$i"))(_ => params, _ => r), | |
| // Re-own the body tree under the synthetic method symbol `sym` created by `Lambda`. | |
| // `rhsFn` typically builds its tree under `Symbol.spliceOwner`; without `changeOwner(sym)` | |
| // the resulting lambda body would have an incorrect owner and could produce invalid trees. |
test/src/alpaca/LexerApiTest.scala
Outdated
| case x: "[a-zA-Z_][a-zA-Z0-9_]*" => Token["id"](x) | ||
| case lit: ('<' | '>' | '=' | '+' | '-' | '*' | '/' | '(' | ')' | '[' | ']' | '{' | '}' | ':' | '\\' | ',' | ';') => | ||
| Token[lit.type] | ||
| // } |
There was a problem hiding this comment.
The closing brace of the lexer definition is commented out. This will cause the test file to fail to compile. The comment marker should be removed to properly close the lexer definition.
| // } | |
| } |
test/src/alpaca/LexerApiTest.scala
Outdated
| "print") => | ||
| Token[keyword.type] | ||
| case x: "[a-zA-Z_][a-zA-Z0-9_]*" => Token["id"](x) | ||
| case lit: ('<' | '>' | '=' | '+' | '-' | '*' | '/' | '(' | ')' | '[' | ']' | '{' | '}' | ':' | '\\' | ',' | ';') => |
There was a problem hiding this comment.
The character literal '\' (backslash) in this union type is incorrect. A single backslash character literal should be written as '\\' (escaped backslash) in Scala. The current syntax '\' would be interpreted as an escape sequence rather than a literal backslash character.
| case c: Char => Regex.quote(c.toString) | ||
| .mkString("|") | ||
|
|
||
| lazy val toRegex: String = patterns.mkString("|") |
There was a problem hiding this comment.
The toRegex method at line 41 converts patterns to a regex string by calling mkString("|") directly on the sequence. For Char elements, this will only include the character itself without proper regex escaping. This can cause issues with special regex characters like '.', '*', '+', etc. Consider using toEscapedRegex for all regex construction or ensuring characters are properly escaped when used in regex contexts.
| lazy val toRegex: String = patterns.mkString("|") | |
| lazy val toRegex: String = toEscapedRegex |
# Conflicts: # build.mill # src/alpaca/DebugSettings.scala # src/alpaca/internal/DebugSettings.scala # src/alpaca/internal/Showable.scala # src/alpaca/internal/UnsafeExtensions.scala # src/alpaca/internal/ValidName.scala # src/alpaca/internal/debugUtils.scala # src/alpaca/internal/internal.scala # src/alpaca/internal/lexer/CompileNameAndPattern.scala # src/alpaca/internal/lexer/Lexer.scala # src/alpaca/internal/lexer/RegexChecker.scala # src/alpaca/internal/lexer/Token.scala # src/alpaca/internal/lexer/Tokenization.scala # src/alpaca/internal/parser/Parser.scala # src/alpaca/internal/parser/createTables.scala # src/alpaca/lexer.scala # test/src/alpaca/CtxRemappingTest.scala # test/src/alpaca/ParserApiTest.scala # test/src/alpaca/integration/CompilationTheoryTest.scala # test/src/alpaca/integration/JsonTest.scala # test/src/alpaca/integration/MainTest.scala # test/src/alpaca/integration/MathTest.scala # test/src/alpaca/internal/parser/ParseTableTest.scala
# Conflicts: # build.mill # src/alpaca/DebugSettings.scala # src/alpaca/internal/DebugSettings.scala # src/alpaca/internal/Showable.scala # src/alpaca/internal/UnsafeExtensions.scala # src/alpaca/internal/ValidName.scala # src/alpaca/internal/debugUtils.scala # src/alpaca/internal/internal.scala # src/alpaca/internal/lexer/CompileNameAndPattern.scala # src/alpaca/internal/lexer/Lexer.scala # src/alpaca/internal/lexer/RegexChecker.scala # src/alpaca/internal/lexer/Token.scala # src/alpaca/internal/lexer/Tokenization.scala # src/alpaca/internal/parser/Parser.scala # src/alpaca/internal/parser/createTables.scala # src/alpaca/lexer.scala # test/src/alpaca/CtxRemappingTest.scala # test/src/alpaca/ParserApiTest.scala # test/src/alpaca/integration/CompilationTheoryTest.scala # test/src/alpaca/integration/JsonTest.scala # test/src/alpaca/integration/MainTest.scala # test/src/alpaca/integration/MathTest.scala # test/src/alpaca/internal/parser/ParseTableTest.scala
# Conflicts: # src/alpaca/internal/internal.scala # src/alpaca/internal/lexer/Lexer.scala
…ings and switch annotation.
…unused code, and improve type safety.
42d3ccf to
99e8fcf
Compare
…sing `treeToStr`, improve readability, and remove redundant `.unsafeMap` calls.
# Conflicts: # src/alpaca/internal/lexer/Lexer.scala
# Conflicts: # src/alpaca/internal/lexer/Lexer.scala # src/alpaca/lexer.scala # test/src/alpaca/LexerApiTest.scala
fe6cb91 to
b8c9aba
Compare
AI-Generated Description
Add Literals Support
This pull request introduces support for literals, alongside significant refactoring to improve code clarity, maintainability, and performance. The changes streamline token handling, simplify debug utilities, and enhance context ownership, resulting in a more robust and efficient codebase.
Key Changes
DebugSettings: UpdatedDebugSettingsto use a parameterized type for improved type safety and flexibility.DebugPosition: Eliminated theDebugPositionopaque type to simplify and streamline debug utilities.Technical Details
Lexer.scala,Token.scala,Tokenization.scala) include enhanced handling of literals and improved token processing logic.DebugPositionopaque type, and improving clarity in debug-related code paths.CompilationTheoryTest,JsonTest) and unit tests (e.g.,LexerTest,LexerApiTest).This PR results in 491 additions and 387 deletions across 22 files, reflecting both the addition of new functionality and the removal of redundant or outdated code.
Statistics