unable to call firefox from selenium in python on AWS machine

Question:

I am trying to use selenium from python to scrape some dynamics pages with javascript. However, I cannot call firefox after I followed the instruction of selenium on the pypi page(http://pypi.python.org/pypi/selenium). I installed firefox on AWS ubuntu 12.04. The error message I got is:

In [1]: from selenium import webdriver

In [2]: br = webdriver.Firefox()
---------------------------------------------------------------------------
WebDriverException                        Traceback (most recent call last)
/home/ubuntu/<ipython-input-2-d6a5d754ea44> in <module>()
----> 1 br = webdriver.Firefox()

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.pyc in __init__(self, firefox_profile, firefox_binary, timeout)
     49         RemoteWebDriver.__init__(self,
     50             command_executor=ExtensionConnection("127.0.0.1", self.profile,
---> 51             self.binary, timeout),
     52             desired_capabilities=DesiredCapabilities.FIREFOX)
     53

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/extension_connection.pyc in __init__(self, host, firefox_profile, firefox_binary, timeout)
     45         self.profile.add_extension()
     46
---> 47         self.binary.launch_browser(self.profile)
     48         _URL = "http://%s:%d/hub" % (HOST, PORT)
     49         RemoteConnection.__init__(

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.pyc in launch_browser(self, profile)
     42
     43         self._start_from_profile_path(self.profile.path)
---> 44         self._wait_until_connectable()
     45
     46     def kill(self):

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.pyc in _wait_until_connectable(self)
     79                 raise WebDriverException("The browser appears to have exited "
     80                       "before we could connect. The output was: %s" %
---> 81                       self._get_firefox_output())
     82             if count == 30:
     83                 self.kill()

WebDriverException: Message: 'The browser appears to have exited before we could connect. The output was: Error: no display specifiedn'

I did search on the web and found that this problem happened with other people (https://groups.google.com/forum/?fromgroups=#!topic/selenium-users/21sJrOJULZY). But I don’t understand the solution, if it is.

Can anyone help me please? Thanks!

Asked By: David

||

Answers:

The problem is Firefox requires a display. I’ve used pyvirtualdisplay in my example to simulate a display. The solution is:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=False, size=(1024, 768))
display.start()

driver= webdriver.Firefox()
driver.get("http://www.somewebsite.com/")

<---some code--->

#driver.close() # Close the current window.
driver.quit() # Quit the driver and close every associated window.
display.stop()

Please note that pyvirtualdisplay requires one of the following back-ends: Xvfb, Xephyr, Xvnc.

This should resolve your issue.

Answered By: That1Guy

I too had faced same problem.I was on Firefox 47 and Selenium 2.53. So what I did was downgraded Firefox to 45. This worked.

1) Remove Firefox 47 first :

sudo apt-get purge firefox

2) Check for available versions:

apt-cache show firefox | grep Version

It will show available firefox versions like:

Version: 47.0+build3-0ubuntu0.16.04.1

Version: 45.0.2+build1-0ubuntu1

3) Tell which build to download

sudo apt-get install firefox=45.0.2+build1-0ubuntu1

4) Next you have to not upgrade to the newer version again.

sudo apt-mark hold firefox

5) If you want to upgrade later

sudo apt-mark unhold firefox
sudo apt-get upgrade

Hope this helps.

Answered By: Amogh Joshi

This is already in the comment of OP’s question, but to lay it out as an answer. You can have Selenium run in the background without opening an actual browser window.

For example, if you use Chrome, set these options:

from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.set_headless(headless=True)

Then when you call your web driver, your settings become a parameter:

browser = webdriver.Chrome(chrome_options=chrome_options)
Answered By: David Skarbrevik

For Debian 10 and Ubuntu 18.04 this is a complete running example:

  1. Download the Chrome driver in ~/Downloads:
    $ wget https://chromedriver.storage.googleapis.com/80.0.3987.16/chromedriver_linux64.zip

  2. Unpack it with unzip chromedriver_linux64.zip

  3. Move the file to an executable folder (already with a path):
    $ sudo mv chromedriver /usr/local/bin

Then run this code in a notebook with Jupyter or within a a script:

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.set_headless(headless=True)


browser = Chrome(chrome_options=chrome_options)
browser.get('http://www.linkedin.com/')
print(browser.page_source)

This will print the whole source HTML in the page.

Answered By: f0nzie