Skip to content

Implement parser error recovery#70

Open
msujew wants to merge 6 commits into
mainfrom
msujew/error-recovery
Open

Implement parser error recovery#70
msujew wants to merge 6 commits into
mainfrom
msujew/error-recovery

Conversation

@msujew
Copy link
Copy Markdown
Member

@msujew msujew commented May 16, 2026

Does as the title says. The implementation is heavily inspired by ANTLR4's implementation, using the same mechanism for the most part. Implements additional caching/precomputation steps to improve performance:

  1. On my machine, the parser only benchmark runs at 100MB/s.
  2. Without caching, the parser benchmark runs at ~45MB/s.
  3. With caching, we reach 91MB/s.

Supporting error recovery thus has a 10% performance penalty on parsing due to having to check the followSet before and during loops and rule calls. This is definitely within expectations and does not impede on our performance goals.

Performs minor changes to the ATN construction to better facillitate accessing the states in the parser generator. It should have no noticable impact on further ATN work (hopefully).

Can be tested quite well by just playing around with the language servers of the grammar and statemachine language. The PR also contains a few automated tests for the statemachine language.

@msujew msujew requested review from Lotes and spoenemann May 16, 2026 12:13
@msujew msujew force-pushed the msujew/error-recovery branch from f8388e0 to 7d499e3 Compare May 19, 2026 09:13
Copy link
Copy Markdown
Collaborator

@Lotes Lotes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely through yet, this is just part I.

Comment thread parser/parser.go
// The returned slice is indexed by TokenType.Id; out-of-range indices indicate
// "not in the follow set".
func (p *ParserState) FollowSet() []bool {
if p.atn == nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which scenario the ATN is nil? Why this check?

}
}
{
p.state.Sync(26)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace magic numbers with constants.

Comment thread parser/atn_runtime.go
func (atn *RuntimeATN) buildNextTokensCache() {
maxId := 0
for _, st := range atn.States {
if st == nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really possible?

Comment thread parser/error_messages.go
}

func (DefaultErrorMessageProvider) UnexpectedEndOfInput(expected *core.TokenType) string {
return "Unexpected end of input, expected '" + expected.Name + "'."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice file for translations

Comment thread parser/error_recovery.go
// Immediately returns if the recovery fails.
// If not within in error, it tries to ensure that the upcoming token is valid for the current decision state.
// This ensures that the parser can continue to make progress and doesn't get stuck on a bad token.
func (DefaultErrorRecovery) Sync(state *ParserState, decisionStateIdx int) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: It might be confusing to use the term "parser STATE" here, because we also use it for ATN and DFA and NFA states. So something that is an automaton.

Maybe CONTEXT is an alternative (thesaurus search: environment, frame, surroundings, situation). This is more an opinion / observation. For a moment I was wondering why all ATN states have a recovery strategy. Then I saw that it is the state of the parser.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixing the variable could be also a solution: "parserState" instead of "state"

Comment thread parser/error_recovery.go
}
}
tok := state.LARaw(1)
valid := state.atn.NextTokensAt(decisionStateIdx)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"valid" looks like a single boolean. Name it validTokens

Comment thread parser/error_recovery.go
if tok == nil || tokenInSet(valid, tok.TypeId) {
return
}
follow := state.FollowSet()
Copy link
Copy Markdown
Collaborator

@Lotes Lotes May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: followSetTokens? Follow sets are always about token (types), or?
During review it is not easy to display the types. Changing the name to something type-reflecting could be a help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants