Scrapy – logging to file and stdout simultaneously, with spider names

Question:

I’ve decided to use the Python logging module because the messages generated by Twisted on std error is too long, and I want to INFO level meaningful messages such as those generated by the StatsCollector to be written on a separate log file while maintaining the on screen messages.

 from twisted.python import log
     import logging
     logging.basicConfig(level=logging.INFO, filemode='w', filename='buyerlog.txt')
     observer = log.PythonLoggingObserver()
     observer.start()

Well, this is fine, I’ve got my messages, but the downside is that I do not know the messages are generated by which spider! This is my log file, with “twisted” being displayed by %(name)s:

 INFO:twisted:Log opened.
  2 INFO:twisted:Scrapy 0.12.0.2543 started (bot: property)
  3 INFO:twisted:scrapy.telnet.TelnetConsole starting on 6023
  4 INFO:twisted:scrapy.webservice.WebService starting on 6080
  5 INFO:twisted:Spider opened
  6 INFO:twisted:Spider opened
  7 INFO:twisted:Received SIGINT, shutting down gracefully. Send again to force unclean shutdown
  8 INFO:twisted:Closing spider (shutdown)
  9 INFO:twisted:Closing spider (shutdown)
 10 INFO:twisted:Dumping spider stats:
 11 {'downloader/exception_count': 3,
 12  'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 3,
 13  'downloader/request_bytes': 9973,

As compared to the messages generated from twisted on standard error:

2011-12-16 17:34:56+0800 [expats] DEBUG: number of rules: 4
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-12-16 17:34:56+0800 [iproperty] INFO: Spider opened
2011-12-16 17:34:56+0800 [iproperty] DEBUG: Redirecting (301) to <GET http://www.iproperty.com.sg/> from <GET http://iproperty.com.sg>
2011-12-16 17:34:57+0800 [iproperty] DEBUG: Crawled (200) <

I’ve tried %(name)s, %(module)s amongst others but I don’t seem to be able to show the spider name. Does anyone knows the answer?

EDIT:
the problem with using LOG_FILE and LOG_LEVEL in settings is that the lower level messages will not be shown on std error.

Asked By: goh

||

Answers:

You want to use the ScrapyFileLogObserver.

import logging
from scrapy.log import ScrapyFileLogObserver

logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()

I’m glad you asked this question, I’ve been wanting to do this myself.

Answered By: Acorn

It is very easy to redirect output using: scrapy some-scrapy's-args 2>&1 | tee -a logname

This way, all what scrapy ouputs into stdout and stderr, will be redirected to a logname file and also, prited to the screen.

Answered By: Alexander Artemenko

I know this is old but it was a really helpful post since the class still isn’t properly documented in the Scrapy docs. Also, we can skip importing logging and use scrapy logs directly. Thanks All!

from scrapy import log

logfile = open('testlog.log', 'a')
log_observer = log.ScrapyFileLogObserver(logfile, level=log.DEBUG)
log_observer.start()
Answered By: IamnotBatman

For all those folks who came here before reading the current documentation version:

import logging
from scrapy.utils.log import configure_logging

configure_logging(install_root_handler=False)
logging.basicConfig(
    filename='log.txt',
    filemode = 'a',
    format='%(levelname)s: %(message)s',
    level=logging.DEBUG
)
Answered By: Alex K.

As the Scrapy Official Doc said:

Scrapy uses Python’s builtin logging system for event logging.

So you can config your logger just as a normal Python script.

First, you have to import the logging module:

import logging

You can add this line to your spider:

logging.getLogger().addHandler(logging.StreamHandler())

It adds a stream handler to log to console.

After that, you have to config logging file path.

Add a dict named custom_settings which consists of your spider-specified settings:

custom_settings = {
     'LOG_FILE': 'my_log.log',
     'LOG_LEVEL': 'INFO',
     ... # you can add more settings
 }

The whole class looks like:

import logging

class AbcSpider(scrapy.Spider):
    name: str = 'abc_spider'
    start_urls = ['you_url']
    custom_settings = {
         'LOG_FILE': 'my_log.log',
         'LOG_LEVEL': 'INFO',
         ... # you can add more settings
     }
     logging.getLogger().addHandler(logging.StreamHandler())

     def parse(self, response):
        pass
Answered By: Shi XiuFeng

ScrapyFileLogObserver is no longer supported. You may use standard python logging module.

import logging
logging.getLogger().addHandler(logging.StreamHandler())
Answered By: Sameera Godakanda

As of Scrapy 2.3, none of the answers mentioned above worked for me.
In addition, the solution found in the documentation caused overwriting of the log file with every message, which is of course not what you want in a log.
I couldn’t find a built-in setting that changed the mode to "a" (append).
I achieved logging to both file and stdout with the following configuration code:

configure_logging(settings={
    "LOG_STDOUT": True
})
file_handler = logging.FileHandler(filename, mode="a")
formatter = logging.Formatter(
    fmt="%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s",
    datefmt="%H:%M:%S"
)
file_handler.setFormatter(formatter)
file_handler.setLevel("DEBUG")
logging.root.addHandler(file_handler) 
Answered By: Royar

Another way is to disable Scrapy’s log setting and use custom setting file.

settings.py

import logging
import yaml

LOG_ENABLED = False
logging.config.dictConfig(yaml.load(open("logging.yml").read(), Loader=yaml.SafeLoader))

logging.yml

version: 1
formatters:
  simple:
    format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
handlers:
  console:
    class: logging.StreamHandler
    level: INFO
    formatter: simple
    stream: ext://sys.stdout
  file:
    class : logging.FileHandler
    level: INFO
    formatter: simple
    filename: scrapy.log
root:
  level: INFO
  handlers: [console, file]
disable_existing_loggers: False

example_spider.py

import scrapy

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ["http://example.com/"]

    def parse(self, response):
        self.logger.info("test")
        pass
Answered By: stonewell
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.