Why are unparanthesized tuples in generators not allowed in the expression field?

Question:

# why is the following invalid
x = (k, v for k, v in some_dict.items())

# but if we wrap the expression part in parentheses it works
x = ((k, v) for k, v in some_dict.items())

After reviewing the documentation, I couldn’t find any information on this issue. What could be causing confusion for the parser to the extent that the syntax is not permitted? This seems strange, since despite that, more complex syntax works just fine:

# k, v somehow confuses the parser but this doesn't???
x = ('%s:%s:%s' % (k, v, k) for k, v in some_dict.items())

If there is actually ambiguity. How come we don’t also need to wrap %s:%s:%s % (k, v, k) with a surrounding parentheses too then?

Asked By: AlanSTACK

||

Answers:

Look at x = (k, v for k, v in some_dict.items()):

x = (k, v for k, v in some_dict.items())
x = ((k, v) for k, v in some_dict.items())
x = (k, (v for k, v in some_dict.items()))

Parentheses are needed to remove the ambiguity.

x = ('%s:%s:%s' % (k, v, k) for k, v in some_dict.items()) requires parentheses too:

x = ('%s:%s:%s' % k, v, k for k, v in some_dict.items())
x = ('%s:%s:%s' % k, (v, k) for k, v in some_dict.items())
x = ('%s:%s:%s' % (k, v, k) for k, v in some_dict.items())

It just so happens that you already had enough parentheses to resolve the ambiguity there in a way that allowed it to run in the expected manner.

Answered By: TigerhawkT3

Python’s parser parses this

x =  (k, v for k, v in some_dict.items())

as a tuple containing k and the generator expression:

v for k, v in some_dict.items()

But this isn’t a valid generator expression: it needs to be, as PEP 289 puts it:

directly inside a set of parentheses and cannot have a comma on either side

What Python sees as a generator here is not directly inside a set of parentheses, and does have a comma on one side, so it is illegal.


The reason it sees it as this is because the parser is (deliberately) very simple. In particular, it is an LL(1) parser, meaning it:

  • Scans tokens from left to right;
  • Considers the current token and the next one (one token of lookahead); and
  • Makes a decision as soon as possible about what an expression means

So, it gets to the current token being k, and sees that the next is a comma. This is a tuple, and it sticks with that decision. It only sees the for later (when the current token is v), so that becomes a generator expression inside the tuple. The parser doesn’t backtrack to see if there’s a potential legal parse of the expression (there is – the one you intended with the tuple being inside the generator expression, but there might not always be), it just throws an error immediately.

Answered By: lvc