Scrapy CrawlerRunner: Output missing
Question:
I have been using the method described on stackoverflow (https://stackoverflow.com/a/43661172/5037146) , to make scrapy run from script using Crawler Runner to allow to restart the process.
However, I don’t get any console logs when running the process through CrawlerRunner, whereas when I using CrawlerProcess, it outputs the status and progress.
Code is available online: https://colab.research.google.com/drive/14hKTjvWWrP–h_yRqUrtxy6aa4jG18nJ
Answers:
With CrawlerRunner
you need to manually setup logging, which you can do with configure_logging()
. See https://docs.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script
When you use CrawlerRunner
you have to manually configure a logger
You can do it using scrapy.utils.log.configure_logging
function
for example
import scrapy.crawler
from my_spider import MySpider
runner = scrapy.crawler.CrawlerRunner()
scrapy.utils.log.configure_logging(
{
"LOG_FORMAT": "%(levelname)s: %(message)s",
},
)
crawler = runner.create_crawler(MySpider)
crawler.crawl()
I have been using the method described on stackoverflow (https://stackoverflow.com/a/43661172/5037146) , to make scrapy run from script using Crawler Runner to allow to restart the process.
However, I don’t get any console logs when running the process through CrawlerRunner, whereas when I using CrawlerProcess, it outputs the status and progress.
Code is available online: https://colab.research.google.com/drive/14hKTjvWWrP–h_yRqUrtxy6aa4jG18nJ
With CrawlerRunner
you need to manually setup logging, which you can do with configure_logging()
. See https://docs.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script
When you use CrawlerRunner
you have to manually configure a logger
You can do it using scrapy.utils.log.configure_logging
function
for example
import scrapy.crawler
from my_spider import MySpider
runner = scrapy.crawler.CrawlerRunner()
scrapy.utils.log.configure_logging(
{
"LOG_FORMAT": "%(levelname)s: %(message)s",
},
)
crawler = runner.create_crawler(MySpider)
crawler.crawl()