Python Selenium Chromedriver not working with –headless option
Question:
I am running chromedriver to try and scrape some data off of a website. Everything works fine without the headless option. However, when I add the option the webdriver takes a very long time to load the url, and when I try to find an element (that is found when run without –headless), I receive an error.
Using print statements and getting the html after the url “loaded”, I find that there is no html, it’s empty (See in output below).
class Fidelity:
def __init__(self):
self.url = 'https://eresearch.fidelity.com/eresearch/gotoBL/fidelityTopOrders.jhtml'
self.options = Options()
self.options.add_argument("--headless")
self.options.add_argument("--window-size=1500,1000")
self.driver = webdriver.Chrome(executable_path='.\dependencies\chromedriver.exe', options = self.options)
print("init")
def initiate_browser(self):
self.driver.get(self.url)
time.sleep(5)
script = self.driver.execute_script("return document.documentElement.outerHTML")
print(script)
print("got url")
def find_orders(self):
wait = WebDriverWait(self.driver, 15)
data= wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]'))) #ERROR ON THIS LINE
This is the entire output:
init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
File "C:UsersZacharyDocumentsPythonTesla Stock InfoScraper.py", line 102, in <module>
orders = scrape.find_tesla_orders()
File "C:UsersZacharyDocumentsPythonTesla Stock InfoScraper.py", line 75, in find_tesla_orders
tesla = self.driver.find_element_by_xpath("//a[@href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']")
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriverremotewebdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriverremotewebdriver.py", line 978, in find_element
'value': value})['value']
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriverremotewebdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriverremoteerrorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']"}
(Session info: headless chrome=74.0.3729.169)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Windows NT 10.0.17763 x86_64)
New error with updated code:
init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
File "C:UsersZacharyDocumentsPythonTesla Stock InfoScraper.py", line 104, in <module>
orders = scrape.find_tesla_orders()
File "C:UsersZacharyDocumentsPythonTesla Stock InfoScraper.py", line 76, in find_tesla_orders
tesla = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriversupportwait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I have tried finding the answer to this through google but none of the suggestions work. Is anyone else having this issue with certain websites? Any help appreciated.
Update
This script still does not work unfortunately, the webdriver is not loading the page correctly for some reason while headless, even though everything works perfectly without running this using the headless option.
Answers:
Add explicit wait. You should also use another locator, the current one match 3 elements. The element has unique id attribute
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.common.by import By
wait = WebDriverWait(self.driver, timeout)
data = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
For anyone in the future who is wondering the fix to this, some websites just don’t load correctly with the headless option of chrome. I don’t think there is a way to fix this. Just use a different browser (like firefox). Thanks to user8426627 for this.
Have you tried using a User-Agent?
I was experiencing the same error. First what I did was to download the HTML source page for both headless and normal with:
html = driver.page_source
file = open("foo.html","w")
file.write(html)
file.close()
The HTML source code for the headless mode was a short file with this line nearly at the end: The page cannot be displayed. Please contact the administrator for additional information.
But the normal mode was the expected HTML.
I solve the issue by adding an User-Agent:
from fake_useragent import UserAgent
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(executable_path = f"your_path",chrome_options=chrome_options)
Try setting the window size as well as being headless. Add this:
chromeOptions.add_argument("--window-size=1920,1080")
The default size of the headless browser is tiny. If the code works when headless is not enabled it might be because your object is outside the window.
some websites just don’t load correctly with the headless option of chrome.
The previous statement is actually wrong. I just got into this problem where Chrome wasn’t detecting the elements. When I saw the @LuckyZakary answer I was shocked because someone created a scrapping for the same website with nodeJs
and didn’t got this error.
@AtulGumar answer helped on Windows but on Ubuntu server it failed. So it wasn’t enough. After reading this, all to the bottom, what @AtulGumar missed was to add the –disable-gpu
flag.
So it work for me on Windows and Ubuntu server with no GUI with those options:
webOptions = webdriver.ChromeOptions()
webOptions.headless = True
webOptions.add_argument("--window-size=1920,1080")
webOptions.add_argument("–disable-gpu")
driver = webdriver.Chrome(options=webOptions)
I also installed xvfb
and other packages as suggested here:
sudo apt-get -y install xorg xvfb gtk2-engines-pixbuf
sudo apt-get -y install dbus-x11 xfonts-base xfonts-100dpi xfonts-75dpi xfonts-cyrillic xfonts-scalable
and executed:
Xvfb -ac :99 -screen 0 1280x1024x16 &
export DISPLAY=:99
I am running chromedriver to try and scrape some data off of a website. Everything works fine without the headless option. However, when I add the option the webdriver takes a very long time to load the url, and when I try to find an element (that is found when run without –headless), I receive an error.
Using print statements and getting the html after the url “loaded”, I find that there is no html, it’s empty (See in output below).
class Fidelity:
def __init__(self):
self.url = 'https://eresearch.fidelity.com/eresearch/gotoBL/fidelityTopOrders.jhtml'
self.options = Options()
self.options.add_argument("--headless")
self.options.add_argument("--window-size=1500,1000")
self.driver = webdriver.Chrome(executable_path='.\dependencies\chromedriver.exe', options = self.options)
print("init")
def initiate_browser(self):
self.driver.get(self.url)
time.sleep(5)
script = self.driver.execute_script("return document.documentElement.outerHTML")
print(script)
print("got url")
def find_orders(self):
wait = WebDriverWait(self.driver, 15)
data= wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]'))) #ERROR ON THIS LINE
This is the entire output:
init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
File "C:UsersZacharyDocumentsPythonTesla Stock InfoScraper.py", line 102, in <module>
orders = scrape.find_tesla_orders()
File "C:UsersZacharyDocumentsPythonTesla Stock InfoScraper.py", line 75, in find_tesla_orders
tesla = self.driver.find_element_by_xpath("//a[@href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']")
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriverremotewebdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriverremotewebdriver.py", line 978, in find_element
'value': value})['value']
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriverremotewebdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriverremoteerrorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']"}
(Session info: headless chrome=74.0.3729.169)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Windows NT 10.0.17763 x86_64)
New error with updated code:
init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
File "C:UsersZacharyDocumentsPythonTesla Stock InfoScraper.py", line 104, in <module>
orders = scrape.find_tesla_orders()
File "C:UsersZacharyDocumentsPythonTesla Stock InfoScraper.py", line 76, in find_tesla_orders
tesla = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
File "C:Program Files (x86)Python37-32libsite-packagesseleniumwebdriversupportwait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I have tried finding the answer to this through google but none of the suggestions work. Is anyone else having this issue with certain websites? Any help appreciated.
Update
This script still does not work unfortunately, the webdriver is not loading the page correctly for some reason while headless, even though everything works perfectly without running this using the headless option.
Add explicit wait. You should also use another locator, the current one match 3 elements. The element has unique id attribute
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.common.by import By
wait = WebDriverWait(self.driver, timeout)
data = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
For anyone in the future who is wondering the fix to this, some websites just don’t load correctly with the headless option of chrome. I don’t think there is a way to fix this. Just use a different browser (like firefox). Thanks to user8426627 for this.
Have you tried using a User-Agent?
I was experiencing the same error. First what I did was to download the HTML source page for both headless and normal with:
html = driver.page_source
file = open("foo.html","w")
file.write(html)
file.close()
The HTML source code for the headless mode was a short file with this line nearly at the end: The page cannot be displayed. Please contact the administrator for additional information.
But the normal mode was the expected HTML.
I solve the issue by adding an User-Agent:
from fake_useragent import UserAgent
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(executable_path = f"your_path",chrome_options=chrome_options)
Try setting the window size as well as being headless. Add this:
chromeOptions.add_argument("--window-size=1920,1080")
The default size of the headless browser is tiny. If the code works when headless is not enabled it might be because your object is outside the window.
some websites just don’t load correctly with the headless option of chrome.
The previous statement is actually wrong. I just got into this problem where Chrome wasn’t detecting the elements. When I saw the @LuckyZakary answer I was shocked because someone created a scrapping for the same website with nodeJs
and didn’t got this error.
@AtulGumar answer helped on Windows but on Ubuntu server it failed. So it wasn’t enough. After reading this, all to the bottom, what @AtulGumar missed was to add the –disable-gpu
flag.
So it work for me on Windows and Ubuntu server with no GUI with those options:
webOptions = webdriver.ChromeOptions()
webOptions.headless = True
webOptions.add_argument("--window-size=1920,1080")
webOptions.add_argument("–disable-gpu")
driver = webdriver.Chrome(options=webOptions)
I also installed xvfb
and other packages as suggested here:
sudo apt-get -y install xorg xvfb gtk2-engines-pixbuf
sudo apt-get -y install dbus-x11 xfonts-base xfonts-100dpi xfonts-75dpi xfonts-cyrillic xfonts-scalable
and executed:
Xvfb -ac :99 -screen 0 1280x1024x16 &
export DISPLAY=:99