How big can the input to the input() function be?

Question:

How large can the input I supply to the input() function be?

Unfortunately, there was no easy way to test it. After using a lot of copy-pasting I couldn’t get input to fail on any input I supplied. (and I eventually gave up)

The documentation for the input function doesn’t mention anything regarding this:

If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that. When EOF is read, EOFError is raised.

So, I’m guessing there is no limit? Does anyone know if there is and, if so, how much is it?

Asked By: user6774416

||

Answers:

Of course there is, it can’t be limitless*. The key sentence from the documentation that I believe needs highlighting is:

[…] The function then reads a line from input, converts it to a string (stripping a trailing newline) […]

(emphasis mine)

Since it converts the input you supply into a Python str object it essentially translates to: “Its size has to be less than or equal to the largest string Python can create”.

The reason why no explicit size is given is probably because this is an implementation detail. Enforcing a maximum size to all other implementations of Python wouldn’t make much sense.

*In CPython, at least, the largest size of a string is bounded by how big its index is allowed to be (see PEP 353). That is, how big the number in the brackets [] is allowed to be when you try and index it:

>>> s = ''
>>> s[2 ** 63]

IndexErrorTraceback (most recent call last)
<ipython-input-10-75e9ac36da20> in <module>()
----> 1 s[2 ** 63]

IndexError: cannot fit 'int' into an index-sized integer

(try the previous with 2 ** 63 - 1, that’s the positive acceptable limit, -2 ** 63 is the negative limit.)

For indices, it isn’t Python numbers that are internally used; instead, it is a Py_ssize_t which is a signed 32/64 bit int on 32/64 bit machines respectively. So, that’s the hard limit from what it seems.

(as the error message states, int and intex-sized integer are two different things)

It also seems like input() explicitly checks if the input supplied is larger than PY_SSIZE_T_MAX (the maximum size of Py_ssize_t) before converting:

if (len > PY_SSIZE_T_MAX) {
    PyErr_SetString(PyExc_OverflowError,
                    "input: input too long");
    result = NULL;
}

Then it converts the input to a Python str with PyUnicode_Decode.


To put that in perspective for you; if the average book is 500.000 characters long and the estimation for the total number of books is around 130 million, you could theoretically input around:

>>> ((2 ** 63) - 1) // 500000 * 130000000
141898

times those characters; it would probably take you some time, though 🙂 (and you’d be limited by available memory first!)

We can find the answer experimentally quite easily. Make two files:

make_lines.py:

num_lines = 34

if __name__ == '__main__':
    for i in range(num_lines):
        print('a' * (2 ** i))

read_input.py:

from make_lines import num_lines

for i in range(num_lines):
    print(len(input()))

Then run this command in Linux or OSX (I don’t know the Windows equivalent):

python make_lines.py | python3 read_input.py

On my computer it manages to finish but struggles by the end, slowing down other processes significantly. The last thing it prints is 8589934592, i.e. 8 GiB. You can find out the value for yourself according to your definition of what’s acceptable in terms of time and memory limits.

Answered By: Alex Hall