What's the benefit of a shared build python vs a static build python?
Question:
This question has been bothering me for two weeks and I’ve searched online and asked people but couldn’t get an answer.
Python by default build the library libpythonMAJOR.MINOR.a
and statically links it into the interpreter. Also it has an --enable-shared
flag, which will build a share library libpythonMAJOR.MINOR.so.1.0
, and dynamically link it to the interpreter.
Based on my poor CS knowledge, the first thought came into my mind when I saw "shared library", is that, "the shared bulid one must save a lot of memory compared to the static build one!".
Then I had this assumption:
# share build
34K Jun 29 11:32 python3.9
21M Jun 29 11:32 libpython3.9.so.1.0
10 shared python processes, mem usage = 0.034M * 10 + 21M ≈ 21M
# static build
22M Jun 27 23:45 python3.9
10 static python processes, mem usage = 10*22M = 220M
shared python wins!
Later I ran a toy test on my machine and found that’s wrong.
test.py
import time
i = 0
while i < 20:
time.sleep(1)
i += 1
print('done')
mem_test.sh
#! /bin/bash
for i in {1..1000}
do
./python3.9 test.py &
done
For share python to run I set export LD_LIBRARY_PATH=/home/tian/py3.9.13_share/lib
.
I ran mem_test.sh
separately (one by one) with 2 pythons and simply monitored the total mem usage via htop
in another console. It turns out that both eat almost the same amount of memory.
Later on people taught me there’s something call "paging on demand":
Is an entire static program loaded into memory when launched?
so my previous calculation of static python mem usage is completely wrong.
Now I am confused. Shared build python doesn’t use less memory via a share library runtime?
Question:
What’s the benefit of a shared build python vs a static build python? Or the shared built python indeed save some memory by the mechanism of using a share library, but my test is too trival to reveal?
P.S.
Checking some python official Dockerfiles, e.g. this one you would see they all set --enable-shared
.
Also there’s related issue on pyenv https://github.com/pyenv/pyenv/issues/2294 , it seems that neither they figure that out.
Answers:
Apart from disk usage purpose as mentioned by @Ouroborus
, I think there’s also the ‘convenience of updating’ purpose: suppose you have installed a version of python that turned out to have a critical security problem. Then, all software using python might be exposed. To fix the problem, you need to update all the pythons in your computer. If this is a shared python, you only need to update a few files, but if a software uses a statically built python, then you have to update that entire software to get an update on python.
However, this benefit may also be regarded as a downside: updating python may introduce breaking changes, and certain software may break, and during the update process, we are unable to detect these breakages.
It turns out to be that others are talking about the scenario "Embedding Python in Another Application" (https://docs.python.org/3/extending/embedding.html).
If that’s the case, then "saving disk space" and other mentioned reasons make sense. Because embedding python in another application, either you need to statically link libpythonMAJOR.MINOR.a
or dynamically link libpythonMAJOR.MINOR.so.1.0
.
So my current conclusion is that whether python is shared built or statically built only affects the "Embedding Python in Another Application" scenario. For normal use cases, e.g. running the python interpeter, it doesn’t make much diffferences.
Update:
Disk usage comparsion, see comments in makfile:
This question has been bothering me for two weeks and I’ve searched online and asked people but couldn’t get an answer.
Python by default build the library libpythonMAJOR.MINOR.a
and statically links it into the interpreter. Also it has an --enable-shared
flag, which will build a share library libpythonMAJOR.MINOR.so.1.0
, and dynamically link it to the interpreter.
Based on my poor CS knowledge, the first thought came into my mind when I saw "shared library", is that, "the shared bulid one must save a lot of memory compared to the static build one!".
Then I had this assumption:
# share build
34K Jun 29 11:32 python3.9
21M Jun 29 11:32 libpython3.9.so.1.0
10 shared python processes, mem usage = 0.034M * 10 + 21M ≈ 21M
# static build
22M Jun 27 23:45 python3.9
10 static python processes, mem usage = 10*22M = 220M
shared python wins!
Later I ran a toy test on my machine and found that’s wrong.
test.py
import time
i = 0
while i < 20:
time.sleep(1)
i += 1
print('done')
mem_test.sh
#! /bin/bash
for i in {1..1000}
do
./python3.9 test.py &
done
For share python to run I set export LD_LIBRARY_PATH=/home/tian/py3.9.13_share/lib
.
I ran mem_test.sh
separately (one by one) with 2 pythons and simply monitored the total mem usage via htop
in another console. It turns out that both eat almost the same amount of memory.
Later on people taught me there’s something call "paging on demand":
Is an entire static program loaded into memory when launched?
so my previous calculation of static python mem usage is completely wrong.
Now I am confused. Shared build python doesn’t use less memory via a share library runtime?
Question:
What’s the benefit of a shared build python vs a static build python? Or the shared built python indeed save some memory by the mechanism of using a share library, but my test is too trival to reveal?
P.S.
Checking some python official Dockerfiles, e.g. this one you would see they all set --enable-shared
.
Also there’s related issue on pyenv https://github.com/pyenv/pyenv/issues/2294 , it seems that neither they figure that out.
Apart from disk usage purpose as mentioned by @Ouroborus
, I think there’s also the ‘convenience of updating’ purpose: suppose you have installed a version of python that turned out to have a critical security problem. Then, all software using python might be exposed. To fix the problem, you need to update all the pythons in your computer. If this is a shared python, you only need to update a few files, but if a software uses a statically built python, then you have to update that entire software to get an update on python.
However, this benefit may also be regarded as a downside: updating python may introduce breaking changes, and certain software may break, and during the update process, we are unable to detect these breakages.
It turns out to be that others are talking about the scenario "Embedding Python in Another Application" (https://docs.python.org/3/extending/embedding.html).
If that’s the case, then "saving disk space" and other mentioned reasons make sense. Because embedding python in another application, either you need to statically link libpythonMAJOR.MINOR.a
or dynamically link libpythonMAJOR.MINOR.so.1.0
.
So my current conclusion is that whether python is shared built or statically built only affects the "Embedding Python in Another Application" scenario. For normal use cases, e.g. running the python interpeter, it doesn’t make much diffferences.
Update:
Disk usage comparsion, see comments in makfile: