Scrapy-deploy to Scrapyd doesn't install requirements pointed in setup.py
Question:
I have a project written with Scrapy. This spider has a lot of requirements in setup.py. Here is a simple example. I run
scrapyd-deploy
and have the following output
Packing version 1506254163
Deploying to project "quotesbot" in http://localhost:6800/addversion.json
Server response (200):
......................... [CUTTED TRACEBACK] ...........
"/private/var/folders/xp/c949vlsd14q8xm__dv0dx8jh0000gn/T/quotesbot-1506254163-e50lmcfx.egg/quotesbot/spiders/toscrape-css.py",
line 4, in <module>n
ModuleNotFoundError: No module named 'sqlalchemy'n"}
BUT
setup.py in the same directory:
# Automatically created by: scrapyd-deploy
from setuptools import setup, find_packages
setup(
name = 'quotesbot',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = quotesbot.settings']},
install_requires=[
'scrapy-splash',
[ SOME REQUIREMENTS]
'sqlalchemy'
],
)
Answers:
I checked scrapyd
source code and it doesn’t run setup.py
of your project. It just unpacks the egg which contains the dependency information but not the dependency itself. Below is the code for addversion api
class AddVersion(WsResource):
def render_POST(self, txrequest):
project = txrequest.args[b'project'][0].decode('utf-8')
version = txrequest.args[b'version'][0].decode('utf-8')
eggf = BytesIO(txrequest.args[b'egg'][0])
self.root.eggstorage.put(eggf, project, version)
spiders = get_spider_list(project, version=version)
self.root.update_projects()
UtilsCache.invalid_cache(project)
return {"node_name": self.root.nodename, "status": "ok", "project": project, "version": version,
"spiders": len(spiders)}
After self.root.eggstorage.put(eggf, project, version)
which basically just extract the egg, it directly runs spiders = get_spider_list(project, version=version)
and hence there is no setup done as such.
So either your egg needs to include all the dependencies, which means you won’t build the egg using scrapyd-deploy
. I couldn’t find much documentation to see if that is possible or not
So what you are seeing is because of srapyd lacking the implementation. You should open a bug or enhancement request at http://github.com/scrapy/scrapyd/
In the scrapyd documentation under the "Including dependencies" section, it says :
If your project has additional dependencies, you can either install them on the Scrapyd server, or you can include them in the project’s egg, in two steps:
Create a requirements.txt file at the root of the project
Use the –include-dependencies option when building or deploying your project:
scrapyd-deploy --include-dependencies
When it says the requirements.txt needs to be in the root of the project, it is referring to the scraper project root directory.
I have a project written with Scrapy. This spider has a lot of requirements in setup.py. Here is a simple example. I run
scrapyd-deploy
and have the following output
Packing version 1506254163
Deploying to project "quotesbot" in http://localhost:6800/addversion.json
Server response (200):
......................... [CUTTED TRACEBACK] ...........
"/private/var/folders/xp/c949vlsd14q8xm__dv0dx8jh0000gn/T/quotesbot-1506254163-e50lmcfx.egg/quotesbot/spiders/toscrape-css.py",
line 4, in <module>n
ModuleNotFoundError: No module named 'sqlalchemy'n"}
BUT
setup.py in the same directory:
# Automatically created by: scrapyd-deploy
from setuptools import setup, find_packages
setup(
name = 'quotesbot',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = quotesbot.settings']},
install_requires=[
'scrapy-splash',
[ SOME REQUIREMENTS]
'sqlalchemy'
],
)
I checked scrapyd
source code and it doesn’t run setup.py
of your project. It just unpacks the egg which contains the dependency information but not the dependency itself. Below is the code for addversion api
class AddVersion(WsResource):
def render_POST(self, txrequest):
project = txrequest.args[b'project'][0].decode('utf-8')
version = txrequest.args[b'version'][0].decode('utf-8')
eggf = BytesIO(txrequest.args[b'egg'][0])
self.root.eggstorage.put(eggf, project, version)
spiders = get_spider_list(project, version=version)
self.root.update_projects()
UtilsCache.invalid_cache(project)
return {"node_name": self.root.nodename, "status": "ok", "project": project, "version": version,
"spiders": len(spiders)}
After self.root.eggstorage.put(eggf, project, version)
which basically just extract the egg, it directly runs spiders = get_spider_list(project, version=version)
and hence there is no setup done as such.
So either your egg needs to include all the dependencies, which means you won’t build the egg using scrapyd-deploy
. I couldn’t find much documentation to see if that is possible or not
So what you are seeing is because of srapyd lacking the implementation. You should open a bug or enhancement request at http://github.com/scrapy/scrapyd/
In the scrapyd documentation under the "Including dependencies" section, it says :
If your project has additional dependencies, you can either install them on the Scrapyd server, or you can include them in the project’s egg, in two steps:
Create a requirements.txt file at the root of the project Use the –include-dependencies option when building or deploying your project:
scrapyd-deploy --include-dependencies
When it says the requirements.txt needs to be in the root of the project, it is referring to the scraper project root directory.