What's the benefit of a shared build python vs a static build python?

Question:

This question has been bothering me for two weeks and I’ve searched online and asked people but couldn’t get an answer.

Python by default build the library libpythonMAJOR.MINOR.a and statically links it into the interpreter. Also it has an --enable-shared flag, which will build a share library libpythonMAJOR.MINOR.so.1.0, and dynamically link it to the interpreter.

Based on my poor CS knowledge, the first thought came into my mind when I saw "shared library", is that, "the shared bulid one must save a lot of memory compared to the static build one!".

Then I had this assumption:

# share build
34K Jun 29 11:32 python3.9
21M Jun 29 11:32 libpython3.9.so.1.0

10 shared python processes, mem usage = 0.034M * 10 + 21M ≈ 21M

# static build
22M Jun 27 23:45 python3.9

10 static python processes, mem usage = 10*22M = 220M

shared python wins!

Later I ran a toy test on my machine and found that’s wrong.

test.py

import time
i = 0
while i < 20:
        time.sleep(1)
        i += 1

print('done')

mem_test.sh

#! /bin/bash
for i in {1..1000}
do 
         ./python3.9 test.py &
done

For share python to run I set export LD_LIBRARY_PATH=/home/tian/py3.9.13_share/lib .

I ran mem_test.sh separately (one by one) with 2 pythons and simply monitored the total mem usage via htop in another console. It turns out that both eat almost the same amount of memory.

Later on people taught me there’s something call "paging on demand":

Is an entire static program loaded into memory when launched?

How does an executable get loaded into RAM, does the whole file get loaded into RAM even when the whole file won’t be needed, or does it get loaded in "chunks"?

so my previous calculation of static python mem usage is completely wrong.

Now I am confused. Shared build python doesn’t use less memory via a share library runtime?

Question:

What’s the benefit of a shared build python vs a static build python? Or the shared built python indeed save some memory by the mechanism of using a share library, but my test is too trival to reveal?

P.S.

Checking some python official Dockerfiles, e.g. this one you would see they all set --enable-shared.

Also there’s related issue on pyenv https://github.com/pyenv/pyenv/issues/2294 , it seems that neither they figure that out.

Asked By: Rick

||

Answers:

Apart from disk usage purpose as mentioned by @Ouroborus, I think there’s also the ‘convenience of updating’ purpose: suppose you have installed a version of python that turned out to have a critical security problem. Then, all software using python might be exposed. To fix the problem, you need to update all the pythons in your computer. If this is a shared python, you only need to update a few files, but if a software uses a statically built python, then you have to update that entire software to get an update on python.

However, this benefit may also be regarded as a downside: updating python may introduce breaking changes, and certain software may break, and during the update process, we are unable to detect these breakages.

Answered By: ice1000

It turns out to be that others are talking about the scenario "Embedding Python in Another Application" (https://docs.python.org/3/extending/embedding.html).

If that’s the case, then "saving disk space" and other mentioned reasons make sense. Because embedding python in another application, either you need to statically link libpythonMAJOR.MINOR.a or dynamically link libpythonMAJOR.MINOR.so.1.0.

So my current conclusion is that whether python is shared built or statically built only affects the "Embedding Python in Another Application" scenario. For normal use cases, e.g. running the python interpeter, it doesn’t make much diffferences.

Update:

Disk usage comparsion, see comments in makfile:

https://stackoverflow.com/a/73099136/5983841

Answered By: Rick
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.