Identify forks of a Python package from within

Question:

I have a Python package (a PyQt GUI app) on Github, published via PyPI. It has a crash reporting facility. Ever once in a while, I receive crashes that are subtly, but unmistakably inconsistent with my code.

Examples: a "class X not found in module Y" error, where dependency module Y was recently updated by adding class X, and I’ve changed the dependency line in setup.py to reference the latest. Another example involves a particular exception reported as thrown on line 78, even though in my code, in that particular version, the raise is on line 80.

I have a strong suspicion those crashes originate in third party forks. GitHub says it has 8 forks.

Anyways, my question is: can I somehow identify crash reports that are originating from third party forks, as opposed to the my builds?

Asked By: Seva Alekseyev

||

Answers:

Here’s what I’ve come up with. I’ve introduced a Python file cookie.py with a placeholder variable:

cookie=False

And placed it into source control as it is, with a False cookie value. During the build process, I place the latest commit hash into the cookie file, like this:

git log --pretty=format:%%H -n 1 >hash.txt
set /p HASH= <hash.txt
echo cookie='%HASH%' >cookie.py

After that, I build the package and revert the contents of the cookie.py to the state with False. Also, I save a copy of the commit hash and the current package version into a history file. This way, the package on PyPI that originates from my builds always has a cookie, and cookies are traceable to releases.

Finally, I make the value of the cookie a part of the crash report. If a crash comes with blank cookie, or an unknown cookie, or a cookie for the wrong version, I’d know it’s not mine, or at least came from a user who runs the package from sources as opposed to getting it from PyPI. The latter is an edge case I’m willing to live with.

This scheme is nowhere near rock solid 🙂 In fact, it’s trivially spoofable, since commit hashes are public information, and the build script is open source too. But it takes a bit of an effort, and there is no payoff for the would-be spoofer, so I presume no one would bother to extend the effort.


Note: %%H on the Git command is an artifact of the Windows command line, on *nix/MacOS it would be a %H.


One highly theoretical downside of this approach is that the PyPI build happens from the set of Python files that doesn’t match, byte for byte, the committed state of the repository. I don’t see how this could come back to haunt me (specifically in a Python project), but still.

Answered By: Seva Alekseyev
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.