Why are f-strings slower than string concatenation when repeatedly adding to a string inside a loop?

Question:

I was benchmarking some code for a project with timeit (using a free replit, so 1024MB of memory):

code = '{"type":"body","layers":['

for x, row in enumerate(pixels):
    for y, pixel in enumerate(row):
        if pixel != (0, 0, 0, 0):
            code += f'''{{"offsetX":{-start + x * gap},"offsetY":{start - y * gap},"rot":45,"size":{size},"sides":4,"outerSides":0,"outerSize":0,"team":"{'#%02x%02x%02x' % (pixel[:3])}","hideBorder":1}},'''
    
code += '],"sides":1,"name":"Image"}}

The loop runs for every single pixel inside a given image (not efficient of course, but I haven’t implemented anything to reduce loop times yet), so any optimization I can get in the loop is worth it.

I remembered that f-strings are faster than string concatenation as long as you’re combining 3+ strings—and as shown, I have a lot more than 3 strings being combined—so I decided to replace the += inside the loop with an f-string and see the improvement.

code = '{"type":"body","layers":['

for x, row in enumerate(pixels):
    for y, pixel in enumerate(row):
        if pixel != (0, 0, 0, 0):
            code = f'''{code}{{"offsetX":{-start + x * gap},"offsetY":{start - y * gap},"rot":45,"size":{size},"sides":4,"outerSides":0,"outerSize":0,"team":"{'#%02x%02x%02x' % (pixel[:3])}","hideBorder":1}},'''
    
code += '],"sides":1,"name":"Image"}}

The results of 500 timeit iterations:

+= took 5.399778672000139 seconds
fstr took 6.91279206800027 seconds

I’ve rerun this multiple times; the above times are the best f-strings have done so far. Why are f-strings slower in this case?

PS: This is my first time posting a question here. Any suggestions on how to improve my future questions would be greatly appreciated 😀

Asked By: ChromaticPixels

||

Answers:

So, first off, repeated concatenation in a language with immutable strings is, theoretically, O(n²), while efficiently implemented bulk concatenation is O(n), so both versions of your code are theoretically bad for repeated concatenation. The version that works everywhere with O(n) work is:

code = ['{"type":"body","layers":[']  # Use list of str, not str

for x, row in enumerate(pixels):
    for y, pixel in enumerate(row):
        if pixel != (0, 0, 0, 0):
            code.append(f'''{{"offsetX":{-start + x * gap},"offsetY":{start - y * gap},"rot":45,"size":{size},"sides":4,"outerSides":0,"outerSize":0,"team":"{'#%02x%02x%02x' % (pixel[:3])}","hideBorder":1}},''')  # Append each new string to list
    
code.append('],"sides":1,"name":"Image"}}')
code = ''.join(code)  # Efficiently join list of str back to single str

Your code with += happens to work efficiently enough because of a CPython specific optimization for string concatenation when concatenating to a string with no other living references, but the very first Programming Recommendation in the PEP8 style guide specifically warns against relying on it:

… do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

Essentially, your original +=-based code benefited from the optimization, and as a result, ended up performing fewer data copies. Your f-string based code did the same work, but in a way that prevented the CPython optimization from applying (building a brand new, increasingly large, str every time). Both approaches are poor form, one of them was just slightly less awful on CPython. When your hot code is performing repeated concatenation, you’re already doing the wrong thing, just use a list of str and ''.join at the end.

Answered By: ShadowRanger

The answer is in the comments and the links provided.

You’ll find that this implementation (of your original example) performs far better:

img = Image.open(r'F:ProjectPythonsandbox_310test.png')
pixels = list(img.getdata())
width, height = img.size

pixels = tuple(pixels[i * width:(i + 1) * width] for i in range(height))

start = 6
gap, size = (start * 2) / (width - 1), 0.1475 * (64 / width) * (start / 6)

data = [(-start + x * gap, start - y * gap, '#%02x%02x%02x' % (pixel[:3]))
        for x, row in enumerate(pixels) for y, pixel in enumerate(row)]

template = f'''{{{{"offsetX":{{}},"offsetY":{{}},"rot":45,"size":{size},"sides":4,"outerSides":0,"outerSize":0,"team":"{{}}","hideBorder":1}}}},'''
code = '{"type":"body","layers":[' + ''.join([template.format(*t) for t in data]) + '],"sides":1,"name":"Image"}}'

Edit: user @kellybundy asked how much faster:

from PIL import Image
from timeit import timeit

img = Image.open(r'F:ProjectPythonsandbox_310test.png')
pixels = list(img.getdata())
width, height = img.size

pixels = tuple(pixels[i * width:(i + 1) * width] for i in range(height))
start = 6
gap, size = (start * 2) / (width - 1), 0.1475 * (64 / width) * (start / 6)


def f_sol():
    data = [(-start + x * gap, start - y * gap, '#%02x%02x%02x' % (pixel[:3]))
            for x, row in enumerate(pixels) for y, pixel in enumerate(row)]

    template = f'''{{{{"offsetX":{{}},"offsetY":{{}},"rot":45,"size":{size},"sides":4,"outerSides":0,"outerSize":0,"team":"{{}}","hideBorder":1}}}},'''
    code = '{"type":"body","layers":[' + ''.join([template.format(*t) for t in data]) + '],"sides":1,"name":"Image"}}'
    return code


def f_op():
    code = '{"type":"body","layers":['

    for x, row in enumerate(pixels):
        for y, pixel in enumerate(row):
            if pixel != (0, 0, 0, 0):
                code += f'''{{"offsetX":{-start + x * gap},"offsetY":{start - y * gap},"rot":45,"size":{size},"sides":4,"outerSides":0,"outerSize":0,"team":"{'#%02x%02x%02x' % (pixel[:3])}","hideBorder":1}},'''

    code += '],"sides":1,"name":"Image"}}'
    return code


assert f_sol() == f_op()
print(timeit(f_sol, number=10))
print(timeit(f_op, number=10))

Output:

1.7875813000027847
47.82409440000265

So, more than 25x faster, which is why I didn’t time them to begin with.

Answered By: Grismar