Selenium (Python, Chrome) not downloading PDF file

Question:

I’m trying to download PDFs from a page which renders a PDF with Chrome PDF viewer. Regular options have not worked. The flow is

  1. Navigate to page
  2. Locate button to download PDF and click it
  3. New tab will be opened (with javascript) where the PDF is visible in Chrome PDF Viewer. The URL for the PDF is always the same, so it does not refer to the file. Simple suggestions like "use requests to download it" will not work – the PDF does not have a URL that points to it.
  4. A regular user can click the download PDF button to download it, but the download is not automatically triggered.

I’ve tried a combination of the following settings, but none of them work. I think what makes my use-case different is that the file download is not automatically triggered, instead selenium will just display the PDF on a new tab and now download it.

With the below settings it will open the new tab and immediately close it, but it will not download the PDF.

if download_files:
    download_dir = os.getcwd()
    print(download_dir)
    prefs['download.default_directory'] = download_dir
    prefs['download.prompt_for_download'] = False
    prefs['download.directory_upgrade'] = True
    prefs['plugins.always_open_pdf_externally'] = True
    prefs['safebrowsing_for_trusted_sources_enabled'] = False
    prefs['safebrowsing.enabled'] = False
    # prefs['plugins.plugins_list'] = [{"enabled": False, "name": "Chrome PDF Viewer"}]
    # prefs['download.extensions_to_open'] = "applications/pdf"
prefs['profile.default_content_settings'] = {"images": 2}
print(prefs)
chrome_options.add_experimental_option("prefs", prefs)
Asked By: qoob

||

Answers:

The above code will work for pages where the URL points to the PDF file. In this case the PDF file is rendered and the website decides which PDF to render based on information in the cookies. So the solution was to get the cookies, and use those in a regular requests call, which returns the PDF-file in bytes that can then be saved or processed.

cookies = driver.get_cookies()
cookies_dict = {c['name']: c['value'] for c in cookies}
response = requests.get(url, cookies=cookies_dict)
with open('file.pdf', 'wb') as f:
    f.write(response.content)
Answered By: qoob