Should I add Python's pyc files to .dockerignore?

Question:

I’ve seen several examples of .dockerignore files for Python projects where *.pyc files and/or __pycache__ folders are ignored:

**/__pycache__
*.pyc

Since these files/folders are going to be recreated in the container anyway, I wonder if it’s a good practice to do so.

Asked By: planetp

||

Answers:

Yes, it’s a recommended practice. There are several reasons:

Reduce the size of the resulting image

In .dockerignore you specify files that won’t go to the resulting image, it may be crucial when you’re building the smallest image. Roughly speaking the size of bytecode files is equal to the size of actual files. Bytecode files aren’t intended for distribution, that’s why we usually put them into .gitignore as well.


Cache related problems

In earlier versions of Python 3.x there were several cached related issues:

Python’s scheme for caching bytecode in .pyc files did not work well
in environments with multiple Python interpreters. If one interpreter
encountered a cached file created by another interpreter, it would
recompile the source and overwrite the cached file, thus losing the
benefits of caching.

Since Python 3.2 all the cached files prefixed with interpreter version as mymodule.cpython-32.pyc and presented under __pychache__ directory. By the way, starting from Python 3.8 you can even control a directory where the cache will be stored. It may be useful when you’re restricting write access to the directory but still want to get benefits of cache usage.

Usually, the cache system works perfectly, but someday something may go wrong. It worth to note that the cached .pyc (lives in the same directory) file will be used instead of the .py file if the .py the file is missing. In practice, it’s not a common occurrence, but if some stuff keeps up being "there", thinking about remove cache files is a good point. It may be important when you’re experimenting with the cache system in Python or executing scripts in different environments.


Security reasons

Most likely that you don’t even need to think about it, but cache files can contain some sort of sensitive information. Due to the current implementation, in .pyc files presented an absolute path to the actual files. There are situations when you don’t want to share such information.


It seems that interacting with bytecode files is a quite frequent necessity, for example, django-extensions have appropriate options compile_pyc and clean_pyc.

Answered By: funnydman