Scrapy-deploy to Scrapyd doesn't install requirements pointed in setup.py

Question:

I have a project written with Scrapy. This spider has a lot of requirements in setup.py. Here is a simple example. I run

scrapyd-deploy

and have the following output

Packing version 1506254163
Deploying to project "quotesbot" in http://localhost:6800/addversion.json
Server response (200):
......................... [CUTTED TRACEBACK] ...........
"/private/var/folders/xp/c949vlsd14q8xm__dv0dx8jh0000gn/T/quotesbot-1506254163-e50lmcfx.egg/quotesbot/spiders/toscrape-css.py",
 line 4, in <module>n
ModuleNotFoundError: No module named 'sqlalchemy'n"}

BUT

setup.py in the same directory:

# Automatically created by: scrapyd-deploy

from setuptools import setup, find_packages

setup(
    name         = 'quotesbot',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = quotesbot.settings']},
    install_requires=[
        'scrapy-splash',
         [ SOME REQUIREMENTS]
        'sqlalchemy'
    ],
)
Asked By: lovesuper

||

Answers:

I checked scrapyd source code and it doesn’t run setup.py of your project. It just unpacks the egg which contains the dependency information but not the dependency itself. Below is the code for addversion api

class AddVersion(WsResource):

    def render_POST(self, txrequest):
        project = txrequest.args[b'project'][0].decode('utf-8')
        version = txrequest.args[b'version'][0].decode('utf-8')
        eggf = BytesIO(txrequest.args[b'egg'][0])
        self.root.eggstorage.put(eggf, project, version)
        spiders = get_spider_list(project, version=version)
        self.root.update_projects()
        UtilsCache.invalid_cache(project)
        return {"node_name": self.root.nodename, "status": "ok", "project": project, "version": version, 
            "spiders": len(spiders)}

After self.root.eggstorage.put(eggf, project, version) which basically just extract the egg, it directly runs spiders = get_spider_list(project, version=version) and hence there is no setup done as such.

So either your egg needs to include all the dependencies, which means you won’t build the egg using scrapyd-deploy. I couldn’t find much documentation to see if that is possible or not

So what you are seeing is because of srapyd lacking the implementation. You should open a bug or enhancement request at http://github.com/scrapy/scrapyd/

Answered By: Tarun Lalwani

In the scrapyd documentation under the "Including dependencies" section, it says :

If your project has additional dependencies, you can either install them on the Scrapyd server, or you can include them in the project’s egg, in two steps:

  • Create a requirements.txt file at the root of the project
  • Use the –include-dependencies option when building or deploying your project:

    scrapyd-deploy --include-dependencies

  • When it says the requirements.txt needs to be in the root of the project, it is referring to the scraper project root directory.

    Answered By: Matthew Grist
    Categories: questions Tags: , , ,
    Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
    at the top-right corner.