How do I manage third-party Python libraries with Google App Engine? (virtualenv? pip?)

Question:

What’s the best strategy for managing third-party Python libraries with Google App Engine?

Say I want to use Flask, a webapp framework. A blog entry says to do this, which doesn’t seem right:

$ cd /tmp/
$ wget http://pypi.python.org/packages/source/F/Flask/Flask-0.6.1.tar.gz
$ tar zxf Flask-0.6.1.tar.gz
$ cp -r Flask-0.6.1/flask ~/path/to/project/
(... repeat for other packages ...)

There must be a better way to manage third-party code, especially if I want to track versions, test upgrades or if two libraries share a subdirectory. I know that Python can import modules from zipfiles and that pip can work with a wonderful REQUIREMENTS file, and I’ve seen that pip has a zip command for use with GAE.

(Note: There’s a handful of similar questions — 1, 2, 3, 4, 5 — but they’re case-specific and don’t really answer my question.)

Asked By: a paid nerd

||

Answers:

Here’s how I do it:

  • project
    • .Python
    • bin
    • lib
      • python2.5
        • site-packages
          • < pip install packages here >
    • include
    • src
      • app.yaml
      • index.yaml
      • main.yaml
      • < symlink the pip installed packages in ../lib/python2.5/site-packages

The project directory is the top level directory where the virtualenv sits. I get the virtualenv using the following commands:

cd project
virtualenv -p /usr/bin/python2.5 --no-site-packages --distribute .

The src directory is where all your code goes. When you deploy your code to GAE, *only* deploy those in the src directory and nothing else. The appcfg.py will resolve the symlinks and copy the library files to GAE for you.

I don’t install my libraries as zip files mainly for convenience in case I need to read the source code, which I happen to do a lot just out of curiosity. However, if you really want to zip the libraries, put the following code snippet into your main.py

import sys
for p in ['librarie.zip', 'package.egg'...]:
    sys.path.insert(0, p)

After this you can import your zipped up packages as usual.

One thing to watch out for is setuptools’ pkg_resources.py. I copied that directly into my src directory so my other symlinked packages can use it. Watch out for anything that uses entry_points. In my case I’m using Toscawidgets2 and I had to dig into the source code to manually wire up the pieces. It can become annoying if you had a lot of libraries that rely on entry_point.

Answered By: Y.H Wong

Note: this answer is specific for Flask on Google App Engine.

See the flask-appengine-template project for an example of how to get Flask extensions to work on App Engine.
https://github.com/kamalgill/flask-appengine-template

Drop the extension into the namespace package folder at src/packages/flaskext and you’re all set.
https://github.com/kamalgill/flask-appengine-template/tree/master/src/lib/flaskext

Non-Flask packages can be dropped into the src/packages folder as zip files, eggs, or unzipped packages, as the project template includes the sys.path.insert() snippet posted above.

Answered By: kamalgill

I prefer buildout.

You set up dependencies in setup.py in your project or buildout.cfg, pin the versions in buildout.cfg, and specify which packages are not available on GAE and should be included in packages.zip. rod.recipe.appengine will copy required packages into packages.zip, and as long as you insert packages.zip into the sys.path, they can be imported anywhere.

You can also use forks from github if the package you need is not on pypi

find-links =
    https://github.com/tesdal/pusher_client_python/tarball/rewrite#egg=pusher-2.0dev2

[versions]
pusher = 2.0dev2

and all of these settings and dependencies are versioned in git.

Instead of wondering which copy of Flask is currently included in your source tree and perhaps copied into your version control (or requiring new developers to manually unpack and upgrade), you simply check the version in buildout.cfg. If you want a new version, change buildout.cfg and rerun buildout.

You can also use it to insert variables into config file templates, like setting the appspot id and version in app.yaml if you have staging server with staging.cfg and so on.

Answered By: tesdal

I recently created a tool for this called gaenv. It follows a requirements.txt format, but doesn’t install it, you can install with pip install -r requirements.txt then run the command line tool gaenv.

$ pip install -r requirements.txt
$ gaenv

This creates symlinks automatically, you could install gaenv in your virtualenv too and run the binary from there.
Here is a blog post about it:

http://blog.altlimit.com/2013/06/google-app-engine-virtualenv-tool-that.html

also on github

https://github.com/faisalraja/gaenv

Answered By: Faisal

What about simply:

$ pip install -r requirements.txt -t <your_app_directory/lib>

Create/edit <your_app_directory>/appengine_config.py:

"""This file is loaded when starting a new application instance."""
import sys
import os.path

# add `lib` subdirectory to `sys.path`, so our `main` module can load
# third-party libraries.
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'lib'))

UPDATE:

Google updated their sample to appengine_config.py, like:

    from google.appengine.ext import vendor
    vendor.add('lib')

Note: Even though their example has .gitignore ignoring lib/ directory you still need to keep that directory under source control if you use git-push deployment method.

Answered By: Wernight

Wernight’s solution is the closest to current practice in the official Flask example app, which I’ve already improved by changing the sys.path.insert() call to site.addsitedir() in order to allow for namespace packages by processing their attendant .pth files (which are important for frameworks like Pyramid).

So far so good, but that appends the directory to the path, and so loses the opportunity to override the included libraries (like WebOb and requests) with newer versions.

What is needed then in appengine_config.py (and I am trying to get this change accepted into the official repos as well) is the following:

"""This file is loaded when starting a new application instance."""
import os.path
import site.addsitedir
import sys.path

dirname = 'lib'
dirpath = os.path.join(os.path.dirname(__file__), dirname)

# split path after 1st element ('.') so local modules are always found first
sys.path, remainder = sys.path[:1], sys.path[1:]

# add `lib` subdirectory as a site directory, so our `main` module can load
# third-party libraries.
site.addsitedir(dirpath)

# append the rest of the path
sys.path.extend(remainder)

The final version of this code may end up hidden away in a vendor.py module and called like insertsitedir(index, path) or some other variation, as you can see in the discussion attending this pull request, but the logic is more or less how it will work regardless, to allow a simple pip install -r requirements.txt -t lib/ to work for all packages including namespace ones, and to still allow overriding the included libraries with new versions, as I have so far been unable to find a simpler alternative.

Answered By: webmaven

(Jun 2021) This post is over a decade old, and so an updated answer is warranted now.

  1. Python 3: list 3P libraries in requirements.txt along with any desired version#s; they’ll be automatically installed by Google upon deployment. (This is the same technique used if you decide to migrate your app to Google Cloud Functions or Cloud Run.)
  2. Python 2 without built-in 3P libraries (regular 3P libraries):
  • Create requirements.txt as above
  • Install/self-bundle/copy them locally, say to lib, via pip install -t lib -r requirements.txt
  • Create appengine_config.py as shown in step 5 on this page
  1. Python 2 with built-in 3P libraries (special set of 3P libraries):
  • All listed 3P libraries linked above are "built-in," meaning they’re available on App Engine servers so you don’t have to copy/self-bundle them w/your app (like in #2 above)
  • It suffices to list them with an available version in the libraries: section of your app.yaml like this
  • (Don’t put built-in libraries in requirements.txt nor use pip install to install them locally unless you want to self-bundle because, say if you need a newer version of the built-in library.)
  • Create appengine_config.py like the above.

If you have a Python 2 app with both built-in and non-built-in 3P libraries, use the techniques in both #2 and #3 above (built-in libraries in app.yaml and non-built-in libraries in requirements.txt and run the pip install cmd above). One of the improvements in the second generation runtimes like Python 3 is that all these games with 3P libraries go away magically (see #1 above).

Example: Flask

Flask is a 3rd-party micro web framework, and it’s an interesting case for this specific question. For Python 3, they all go into requirements.txt, so you’d just add flask to that file, and you’re done. (Just deploy from there.)

For Python 2, it’s even more interesting because it’s a built-in library. Unfortunately, the version on App Engine servers is 0.12. Who wants to use that when we’re at/beyond 2.0.3 now?!? So instead of putting it in app.yaml like other built-in libraries, you’d pretend the built-in version doesn’t exist and put it in requirements.txt then run pip2 install -t lib -r requirements.txt to bundle/vendor it with your application code. (However, the final version for Python 2 is 1.1.4, so that’s what gets installed.)

Answered By: wescpy