-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Our logical-line error recovery doesn't work as expected for unclosed interpolated strings. I'm also not sure if it works as expected in other cases but more below.
We call re_lex_logical_token during list parsing recovery if the next token is a valid list element or list terminator of any active recovery context, and we use list parsing (or the comma-separated variant thereof) for interpolated strings, lists, dictionaries, statements, etc.
Here are a few thoughts after trying to "quickly" make it work for interpolated strings:
re_lex_logical_tokenalways decrements the nesting but this can lead to inconsistencies where the interpolated string on the top of the stack now has a deeper nesting than the Lexer's nesting. I thinkre_lex_logical_tokenshould pop the interpolated string in that case.- Similarly, an interpolated string can have a format-specifier that we track based on the nesting. Should it probably end those?
- Always decrementing
self.nestingby one feels incorrect. We should only do so if we've seen a(,[or{because it's only then that we increment thenesting. E.g. the tuple parsing shouldn't decrement if the tuple has no parentheses of its own. - The other part that feels off about always decrementing
nestingregardless of what the current parsing function (error context) is that we only want to decrementnestingif the parentheses belong to the current context. - It's very easy to forget syncing the lexer's
self.nestingwith the parser's understanding during manual error recovery: e.g., after a manualself.expect(']'). - The base promise of
re_lex_logical_tokenis that it only lexesNonLogicalNewlinetokens toNewlinebut that's not guaranteed because changing the nesting can move the lexer in/out of interpolated strings. - If the only purpose of
re_lex_logical_tokenis to changeNonLogicalNewlinetoNewline, then it should early return whenself.nesting > 0because the newline token will never be lexed differently, ifnestingisn't zero - A base assumption of the list error recovery is that
re_lex_logical_tokenwould be calledNtimes by unwinding the recovery context so thatself.nestingultimately reaches 0 and the re-lexing happens. This has two issues a) more inner context doesn't see the re-lexed token, even if it's pretty clear that we should resetnestingto0when we see adefstatement, and b) I don't think we call the recovery function for every nesting level. Using the following rather broken code:
# Regression test for https://github.com/astral-sh/ty/issues/1828
(c: int = 1,f"""{d=[
def a(
class A:
passI don't fully understand what's happening but this is what I'm seeing:
- We hit error recovery the first time after the
Lsqb. Nesting is already 3 at this point (because of the[. The lexer decrementsnestingbut it doesn't re-lex because there's no precedingNonLogicalNewline(the next token isLsqb) - The next recovery is already
Argumentsfromdef a(. There's no recovery from the unclosed(
A few things seem to go wrong here:
- The parser never pushes an error recovery context for
(because we parse parenthesized expressions manually - The lexer bumps nesting for
(,{, and[but the parser never enters the list parsing for[.
This inevitably leads the parser to call re_lex_logical_token too few times (once instead of three), so the error recovery doesn't kick in.
This makes me wonder if we need more carefully placed re_lex calls rather than a catch-all during error recovery (similar to what we do with unclosed interpolated strings).
For example, we could accept that the lexer and parser state diverge during expression parsing. Instead, we could have a manual relex call that resets the lexer into a "start of a logical line" (no interpolated strings, no nesting, no nothing) before parsing any statement or other logical line.
Syncing interpolated strings seems tricky because the parser and lexer state are so intertwined, maybe even so much, that there's an argument that the interpolated string state tracking should be within the parser.
I don't have any concrete solution here and what we have sort of works but it's also very easy to make it get out of sync, of which it sometimes never recovers from.