How does List Comprehension exactly work in Python?
Question:
I am going trough the docs of Python 3.X, I have doubt about List Comprehension speed of execution and how it exactly work.
Let’s take the following example:
Listing 1
...
L = range(0,10)
L = [x ** 2 for x in L]
...
Now in my knowledge this return a new listing, and it’s equivalent to write down:
Listing 2
...
res = []
for x in L:
res.append(x ** 2)
...
The main difference is the speed of execution if I am correct. Listing 1 is supposed to be performed at C language speed inside the interpreter, meanwhile Listing 2 is not.
But Listing 2 is what the list comprehension does internally (not sure), so why Listing 1 is executed at C Speed inside the interpreter & Listing 2 is not? Both are converted to byte code before being processed, or am I missing something?
Answers:
The answer is actually in your question.
When you run any built in python function you are running something that has been written in C and compiled into machine code.
When you write your own version of it, that code must be converted into CPython objects which are handled by the interpreter.
In consequence the built-in approach or function is always quicker (or takes less space) in Python than writing your own function.
Look at the actual bytecode that is produced. I’ve put the two fragments of code into fuctions called f1 and f2.
The comprehension does this:
3 15 LOAD_CONST 3 (<code object <listcomp> at 0x7fbf6c1b59c0, file "<stdin>", line 3>)
18 LOAD_CONST 4 ('f1.<locals>.<listcomp>')
21 MAKE_FUNCTION 0
24 LOAD_FAST 0 (L)
27 GET_ITER
28 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
31 STORE_FAST 0 (L)
Notice there is no loop in the bytecode. The loop happens in C.
Now the for loop does this:
4 21 SETUP_LOOP 31 (to 55)
24 LOAD_FAST 0 (L)
27 GET_ITER
>> 28 FOR_ITER 23 (to 54)
31 STORE_FAST 2 (x)
34 LOAD_FAST 1 (res)
37 LOAD_ATTR 1 (append)
40 LOAD_FAST 2 (x)
43 LOAD_CONST 3 (2)
46 BINARY_POWER
47 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
50 POP_TOP
51 JUMP_ABSOLUTE 28
>> 54 POP_BLOCK
In contrast to the comprehension, the loop is clearly here in the bytecode. So the loop occurs in python.
The bytecodes are different, and the first should be faster.
I am going trough the docs of Python 3.X, I have doubt about List Comprehension speed of execution and how it exactly work.
Let’s take the following example:
Listing 1
...
L = range(0,10)
L = [x ** 2 for x in L]
...
Now in my knowledge this return a new listing, and it’s equivalent to write down:
Listing 2
...
res = []
for x in L:
res.append(x ** 2)
...
The main difference is the speed of execution if I am correct. Listing 1 is supposed to be performed at C language speed inside the interpreter, meanwhile Listing 2 is not.
But Listing 2 is what the list comprehension does internally (not sure), so why Listing 1 is executed at C Speed inside the interpreter & Listing 2 is not? Both are converted to byte code before being processed, or am I missing something?
The answer is actually in your question.
When you run any built in python function you are running something that has been written in C and compiled into machine code.
When you write your own version of it, that code must be converted into CPython objects which are handled by the interpreter.
In consequence the built-in approach or function is always quicker (or takes less space) in Python than writing your own function.
Look at the actual bytecode that is produced. I’ve put the two fragments of code into fuctions called f1 and f2.
The comprehension does this:
3 15 LOAD_CONST 3 (<code object <listcomp> at 0x7fbf6c1b59c0, file "<stdin>", line 3>)
18 LOAD_CONST 4 ('f1.<locals>.<listcomp>')
21 MAKE_FUNCTION 0
24 LOAD_FAST 0 (L)
27 GET_ITER
28 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
31 STORE_FAST 0 (L)
Notice there is no loop in the bytecode. The loop happens in C.
Now the for loop does this:
4 21 SETUP_LOOP 31 (to 55)
24 LOAD_FAST 0 (L)
27 GET_ITER
>> 28 FOR_ITER 23 (to 54)
31 STORE_FAST 2 (x)
34 LOAD_FAST 1 (res)
37 LOAD_ATTR 1 (append)
40 LOAD_FAST 2 (x)
43 LOAD_CONST 3 (2)
46 BINARY_POWER
47 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
50 POP_TOP
51 JUMP_ABSOLUTE 28
>> 54 POP_BLOCK
In contrast to the comprehension, the loop is clearly here in the bytecode. So the loop occurs in python.
The bytecodes are different, and the first should be faster.