Scrapy crawl http header data only

Question

(How) can I archieve that scrapy only downloads the header data of a website (for check purposes etc.)

I’ve tried to disable some download-middlewares but it doesn’t seem to work.

||

Answer 1

Like @alexce said, you can issue HEAD Requests instead of the default GET:

Request(url, method="HEAD")

~~UPDATE: If you want to use HEAD requests for your start_urls you will need to override the make_requests_from_url method:~~

def make_requests_from_url(self, url):
    return Request(url, method='HEAD', dont_filter=True)

UPDATE: make_requests_from_url was removed in Scrapy 2.6.

Question: