Printing without parentheses varying error message using Python 3

Question:

When I try to use print without parentheses on a simple name in Python 3.4 I get:

>>> print max
Traceback (most recent call last):
  ...
  File "<interactive input>", line 1
    print max
            ^
SyntaxError: Missing parentheses in call to 'print'

Ok, now I get it, I just forgot to port my Python 2 code.

But now when I try to print the result of a function:

>>> print max([1,2])
Traceback (most recent call last):
    ...
    print max([1,2])
            ^
SyntaxError: invalid syntax

Or:

print max.__call__(23)
        ^
SyntaxError: invalid syntax

(Note that the cursor is pointing to the character before the first dot in that case.)

The message is different (and slightly misleading, since the marker is below the max function).

Why isn’t Python able to detect the problem earlier?

Note: This question was inspired by the confusion around this question: Pandas read.csv syntax error, where a few Python experts missed the real issue because of the misleading error message.

Answers:

The special exception message for print used as statement instead of as function is actually implemented as a special case.

Roughly speaking when a SyntaxError is created it calls a special function that checks for a print statement based on the line the exception refers to.

However, the first test in this function (the one responsible for the “Missing parenthesis” error message) is if there is any opening parenthesis in the line. I copied the source code for that function (CPython 3.6.4) and I marked the relevant lines with “arrows”:

static int
_report_missing_parentheses(PySyntaxErrorObject *self)
{
    Py_UCS4 left_paren = 40;
    Py_ssize_t left_paren_index;
    Py_ssize_t text_len = PyUnicode_GET_LENGTH(self->text);
    int legacy_check_result = 0;

    /* Skip entirely if there is an opening parenthesis <---------------------------- */
    left_paren_index = PyUnicode_FindChar(self->text, left_paren,
                                          0, text_len, 1);
    if (left_paren_index < -1) {
        return -1;
    }
    if (left_paren_index != -1) {
        /* Use default error message for any line with an opening parenthesis <------------ */
        return 0;
    }
    /* Handle the simple statement case */
    legacy_check_result = _check_for_legacy_statements(self, 0);
    if (legacy_check_result < 0) {
        return -1;

    }
    if (legacy_check_result == 0) {
        /* Handle the one-line complex statement case */
        Py_UCS4 colon = 58;
        Py_ssize_t colon_index;
        colon_index = PyUnicode_FindChar(self->text, colon,
                                         0, text_len, 1);
        if (colon_index < -1) {
            return -1;
        }
        if (colon_index >= 0 && colon_index < text_len) {
            /* Check again, starting from just after the colon */
            if (_check_for_legacy_statements(self, colon_index+1) < 0) {
                return -1;
            }
        }
    }
    return 0;
}

That means it won’t trigger the “Missing parenthesis” message if there is any opening parenthesis in the line. That leads to the general SyntaxError message even if the opening parenthesis is in a comment:

print 10  # what(
    print 10  # what(
           ^
SyntaxError: invalid syntax

Note that the cursor position for two names/variables separated by a white space is always the end of the second name:

>>> 10 100
    10 100
         ^
SyntaxError: invalid syntax

>>> name1 name2
    name1 name2
              ^
SyntaxError: invalid syntax

>>> name1 name2([1, 2])
    name1 name2([1, 2])
              ^
SyntaxError: invalid syntax

So it is no wonder the cursor points to the x of max, because it’s the last character in the second name. Everything that follows the second name (like ., (, [, …) is ignored, because Python already found a SyntaxError, and it doesn’t need to go further, because nothing could make it valid syntax.

Answered By: MSeifert

Maybe I’m not understanding something, but I don’t see why Python should point out the error earlier. print is a regular function, that is a variable referencing a function, so these are all valid statements:

print(10)
print, max, 2
str(print)
print.__doc__
[print] + ['a', 'b']
{print: 2}

As I understand it, the parser needs to read the next full token after print (max in this case) in order to determine whether there is a syntax error. It cannot just say “fail if there is no open parenthesis”, because there are a number of different tokens that may go after print depending on the current context.

I don’t think there is a case where print may be directly followed by another identifier or a literal, so you could argue that as soon as there is one letter, a number or quotes you should stop, but that would be mixing the parser’s and the lexer’s job.

Answered By: jdehesa

Looking at the source code for exceptions.c, right above _set_legacy_print_statement_msg there’s this nice block comment:

/* To help with migration from Python 2, SyntaxError.__init__ applies some
 * heuristics to try to report a more meaningful exception when print and
 * exec are used like statements.
 *
 * The heuristics are currently expected to detect the following cases:
 *   - top level statement
 *   - statement in a nested suite
 *   - trailing section of a one line complex statement
 *
 * They're currently known not to trigger:
 *   - after a semi-colon
 *
 * The error message can be a bit odd in cases where the "arguments" are
 * completely illegal syntactically, but that isn't worth the hassle of
 * fixing.
 *
 * We also can't do anything about cases that are legal Python 3 syntax
 * but mean something entirely different from what they did in Python 2
 * (omitting the arguments entirely, printing items preceded by a unary plus
 * or minus, using the stream redirection syntax).
 */

So there’s some interesting info. In addition, in the SyntaxError_init method in the same file, we can see

    /*
     * Issue #21669: Custom error for 'print' & 'exec' as statements
     *
     * Only applies to SyntaxError instances, not to subclasses such
     * as TabError or IndentationError (see issue #31161)
     */
    if ((PyObject*)Py_TYPE(self) == PyExc_SyntaxError &&
            self->text && PyUnicode_Check(self->text) &&
            _report_missing_parentheses(self) < 0) {
        return -1;
    }

Note also that the above references issue #21669 on the python bugtracker with some discussion between the author and Guido about how to go about this. So we follow the rabbit (that is, _report_missing_parentheses) which is at the very bottom of the file, and see…

legacy_check_result = _check_for_legacy_statements(self, 0);

However, there are some cases where this is bypassed and the normal SyntaxError message is printed, see MSeifert’s answer for more about that. If we go one function up to _check_for_legacy_statements we finally see the actual check for legacy print statements.

/* Check for legacy print statements */
if (print_prefix == NULL) {
    print_prefix = PyUnicode_InternFromString("print ");
    if (print_prefix == NULL) {
        return -1;
    }
}
if (PyUnicode_Tailmatch(self->text, print_prefix,
                        start, text_len, -1)) {

    return _set_legacy_print_statement_msg(self, start);
}

So, to answer the question: “Why isn’t Python able to detect the problem earlier?”, I would say the problem with parentheses isn’t what is detected; it is actually parsed after the syntax error. It’s a syntax error the whole time, but the actual minor piece about parentheses is caught afterwards just to give an additional hint.

Answered By: alkasm

in additions to those excellent answers, without even looking at the source code, we could have guessed that the print special error message was a kludge:

so:

print dfjdkf
           ^
SyntaxError: Missing parentheses in call to 'print'

but:

>>> a = print
>>> a dsds
Traceback (most recent call last):
  File "<interactive input>", line 1
    a dsds
         ^
SyntaxError: invalid syntax

even if a == print but at that stage, it isn’t evaluated yet, so you get the generic invalid syntax message instead of the hacked print syntax message, which proves that there’s a kludge when the first token is print.

another proof if needed:

>>> print = None
>>> print a
Traceback (most recent call last):
  File "C:Python34libcode.py", line 63, in runsource
    print a
          ^
SyntaxError: Missing parentheses in call to 'print'

in that case print == None, but the specific message still appears.