Why is str.strip() so much faster than str.strip(' ')?

Question:

Splitting on white-space can be done in two ways with str.strip. You can either issue a call with no arguments, str.strip(), which defaults to using a white-space delimiter or explicitly supply the argument yourself with str.strip(' ').

But, why is it that when timed these functions perform so differently?

Using a sample string with an intentional amount of white spaces:

s = " " * 100 + 'a' + " " * 100

The timings for s.strip() and s.strip(' ') are respectively:

%timeit s.strip()
The slowest run took 32.74 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 396 ns per loop

%timeit s.strip(' ')
100000 loops, best of 3: 4.5 µs per loop

strip takes 396ns while strip(' ') takes 4.5 μs, a similar scenario is present with rstrip and lstrip under the same conditions. Also, bytes objects seem do be affected too.

The timings were performed for Python 3.5.2, on Python 2.7.1 the difference is less drastic. The docs on str.strip don’t indicate anything useful, so, why does this happen?

Answers:

In a tl;dr fashion:

This is because two functions exist for the two different cases, as can be seen in unicode_strip; do_strip and _PyUnicodeXStrip the first executing much faster than the second.

Function do_strip is for the common case str.strip() where no arguments exist and do_argstrip (which wraps _PyUnicode_XStrip) for the case where str.strip(arg) is called, i.e arguments are provided.


do_argstrip just checks the separator and if it is valid and not equal to None (in which case it calls do_strip) it calls _PyUnicode_XStrip.

Both do_strip and _PyUnicode_XStrip follow the same logic, two counters are used, one equal to zero and the other equal to the length of the string.

Using two while loops, the first counter is incremented until a value not equal to the separator is reached and the second counter is decremented until the same condition is met.

The difference lies in the way checking if the current character is not equal to the separator is performed.

For do_strip:

In the most common case where the characters in the string to be split can be represented in ascii an additional small performance boost is present.

while (i < len) {
    Py_UCS1 ch = data[i];
    if (!_Py_ascii_whitespace[ch])
        break;
    i++;
}
  • Accessing the current character in the data is made quickly with by accessing the underlying array: Py_UCS1 ch = data[i];
  • The check if a character is a white-space is made by a simple array index into an array called _Py_ascii_whitespace[ch].

So, in short, it is quite efficient.

If the characters are not in the ascii range, the differences aren’t that drastic but they do slow the overall execution down:

while (i < len) {
    Py_UCS4 ch = PyUnicode_READ(kind, data, i);
    if (!Py_UNICODE_ISSPACE(ch))
        break;
    i++;
}
  • Accessing is done with Py_UCS4 ch = PyUnicode_READ(kind, data, i);
  • Checking if the character is whitespace is done by the Py_UNICODE_ISSPACE(ch) macro (which simply calls another macro: Py_ISSPACE)

For _PyUnicodeXStrip:

For this case, accessing the underlying data is, as it was in the previous case, done with PyUnicode_Read; the check, on the other hand, to see if the character is a white-space (or really, any character we’ve provided) is reasonably a bit more complex.

while (i < len) {
     Py_UCS4 ch = PyUnicode_READ(kind, data, i);
     if (!BLOOM(sepmask, ch))
         break;
     if (PyUnicode_FindChar(sepobj, ch, 0, seplen, 1) < 0)
         break;
     i++;
}

PyUnicode_FindChar is used, which, although efficient, is much more complex and slow compared to an array access. For each character in the string it is called to see if that character is contained in the separator(s) we’ve provided. As the length of the string increases, so does the overhead introduced by calling this function continuously.

For those interested, PyUnicode_FindChar after quite some checks, will eventually call find_char inside stringlib which in the case where the length of the separators is < 10 will loop until it finds the character.

Apart from this, consider the additional functions that need to already be called in order to get here.


As for lstrip and rstrip, the situation is similar. Flags for which mode of striping to perform exist, namely: RIGHTSTRIP for rstrip, LEFTSTRIP for lstrip and BOTHSTRIP for strip. The logic inside do_strip and _PyUnicode_XStrip is performed conditionally based on the flag.

For the reasons explained in @Jims answer the same behavior is found in bytes objects:

b = bytes(" " * 100 + "a" + " " * 100, encoding='ascii')

b.strip()      # takes 427ns
b.strip(b' ')  # takes 1.2μs

For bytearray objects this doesn’t happen, the functions performing the split in this case are similar for both cases.

Additionally, in Python 2 the same applies to a smaller extent according to my timings.

Answered By: root