Implement parser error recovery#70
Conversation
f8388e0 to
7d499e3
Compare
Lotes
left a comment
There was a problem hiding this comment.
Not entirely through yet, this is just part I.
| // The returned slice is indexed by TokenType.Id; out-of-range indices indicate | ||
| // "not in the follow set". | ||
| func (p *ParserState) FollowSet() []bool { | ||
| if p.atn == nil { |
There was a problem hiding this comment.
In which scenario the ATN is nil? Why this check?
| } | ||
| } | ||
| { | ||
| p.state.Sync(26) |
There was a problem hiding this comment.
Replace magic numbers with constants.
| func (atn *RuntimeATN) buildNextTokensCache() { | ||
| maxId := 0 | ||
| for _, st := range atn.States { | ||
| if st == nil { |
| } | ||
|
|
||
| func (DefaultErrorMessageProvider) UnexpectedEndOfInput(expected *core.TokenType) string { | ||
| return "Unexpected end of input, expected '" + expected.Name + "'." |
| // Immediately returns if the recovery fails. | ||
| // If not within in error, it tries to ensure that the upcoming token is valid for the current decision state. | ||
| // This ensures that the parser can continue to make progress and doesn't get stuck on a bad token. | ||
| func (DefaultErrorRecovery) Sync(state *ParserState, decisionStateIdx int) { |
There was a problem hiding this comment.
Question: It might be confusing to use the term "parser STATE" here, because we also use it for ATN and DFA and NFA states. So something that is an automaton.
Maybe CONTEXT is an alternative (thesaurus search: environment, frame, surroundings, situation). This is more an opinion / observation. For a moment I was wondering why all ATN states have a recovery strategy. Then I saw that it is the state of the parser.
There was a problem hiding this comment.
Prefixing the variable could be also a solution: "parserState" instead of "state"
| } | ||
| } | ||
| tok := state.LARaw(1) | ||
| valid := state.atn.NextTokensAt(decisionStateIdx) |
There was a problem hiding this comment.
"valid" looks like a single boolean. Name it validTokens
| if tok == nil || tokenInSet(valid, tok.TypeId) { | ||
| return | ||
| } | ||
| follow := state.FollowSet() |
There was a problem hiding this comment.
Suggestion: followSetTokens? Follow sets are always about token (types), or?
During review it is not easy to display the types. Changing the name to something type-reflecting could be a help.
Does as the title says. The implementation is heavily inspired by ANTLR4's implementation, using the same mechanism for the most part. Implements additional caching/precomputation steps to improve performance:
Supporting error recovery thus has a 10% performance penalty on parsing due to having to check the
followSetbefore and during loops and rule calls. This is definitely within expectations and does not impede on our performance goals.Performs minor changes to the ATN construction to better facillitate accessing the states in the parser generator. It should have no noticable impact on further ATN work (hopefully).
Can be tested quite well by just playing around with the language servers of the grammar and statemachine language. The PR also contains a few automated tests for the statemachine language.