Why is python slower inside a docker container?

Question:

The following small code snippet times how long adding a bunch of numbers takes.

import gc
from time import process_time_ns

gc.disable() # disable garbage collection
for func in [
    process_time_ns,
]:
    pre = func()

    s = 0
    for a in range(100000):
        for b in range(100):
            s += b
    print(f"sum: {s}")

    post = func()
    delta_s = (post - pre) / 1e9 # difference in seconds
    print(f"{func}: {delta_s}")

To my surprise, this takes much longer when run inside a docker container (~1.6s) than it does when run directly on the host machine (~0.8s).
After some digging, I found that some of docker’s security features may cause slowdowns (https://betterprogramming.pub/faster-python-in-docker-d1a71a9b9917, https://pythonspeed.com/articles/docker-performance-overhead/). Indeed, adding the docker argument --privileged reduces it’s runtime only ~0.9s.
However, I’m still confused by this ~0.1s gap I’m observing, which doesn’t show up in the article.
I’ve set my cpu frequency to 3000MHz and fixed the python execution to run on core 0.

Statistics of 30 measurements each:

local docker –privileged docker
avg 0.79917586 0.904496884 1.61980727
std 0.02433539 0.031948695 0.04034594
min 0.78087375 0.867265714 1.56995282
q1 0.78211388 0.880717119 1.58672566
q2 0.79006154 0.895180195 1.61322376
q3 0.80732969 0.916945585 1.64363027
max 0.89824817 1.012580084 1.72252714

For measurements, the following commands were used:

  • local: taskset -c 0 python3 main.py
  • docker –privileged: taskset -c 0 docker run --privileged --rm -w /data -v /home/slammer/Projects/timing-python-inside-docker:/data -it python:3 python main.py
  • docker: taskset -c 0 docker run --rm -w /data -v /home/slammer/Projects/timing-python-inside-docker:/data -it python:3 python main.py

What causes the remaining docker overhead?
Can it be mitigated to achieve bare-metal-performance?

Edit: Measurements were taken on a linux mint 20.3 host (kernel: x86_64 Linux 5.4.0-117-generic); docker version: 20.10.17

Asked By: GitProphet

||

Answers:

The slowdown seems to be caused not by docker, but by differences in the python binary.

I copied the python packaged within the docker image python:3 to my host machine (copying docker’s /usr/local to my hosts docker-python folder).
Then I ran the same benchmark again on using this binary with the following command: LD_LIBRARY_PATH=docker-python/local/lib taskset -c 0 docker-python/local/bin/python3.10 main.py
And voila, the measurements using this "dockerbinary" are the same (within measurement error) as those measured with "docker –privileged":

local dockerbinary docker –privileged docker
avg 0.79917586 0.89829016 0.904496884 1.61980727
std 0.02433539 0.03554546 0.031948695 0.04034594
min 0.78087375 0.86344007 0.867265714 1.56995282
q1 0.78211388 0.86950620 0.880717119 1.58672566
q2 0.79006154 0.88853465 0.895180195 1.61322376
q3 0.80732969 0.91612282 0.916945585 1.64363027
max 0.89824817 0.99477790 1.012580084 1.72252714

Mystery solved 🙂


Now, what is the difference between these binaries?
As far as I could tell, the binary shipped with docker is with debug_info, not stripped, while my local binary was only stripped.

$ file `which python3.10`
/usr/bin/python3.10: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=fb3f4369481251e6ba441382fd6d9ab47af0db29, for GNU/Linux 3.2.0, stripped
$ file docker-python/local/bin/python3.10
docker-python/local/bin/python3.10: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=618b23f947f202224f4ea8e16375ac7bcad13c4f, for GNU/Linux 3.2.0, with debug_info, not stripped

My guess is that the with debug_info compilation introduces this ~11% performance overhead.
If this is correct, it prompts the next question "Why does the default docker image use this binary if it causes such a significant slowdown?".
To that, I have no answer at the moment (also this guess may be entirely wrong).

Crosslink: https://github.com/docker-library/python/issues/825

Answered By: GitProphet
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.