python selenium site spider from page option list
Question:
I’m using Python2.7
and Selenium
and am trying to use a select box’s option list for the basis of my site spider functionality, lets get right to the code:
select = self.br.find_element_by_name( field ) #get the select element
options = select.find_elements_by_tag_name("option") #get all the options into a list
for option in options: #iterate over the options
print "starting loop on option %s" % option.text
#now get the option with the value that is currently being iterated over and select it from the original select box source
self.br.find_element_by_xpath("//select[@name='%s']/option[@value='%s']" % ( field, option.get_attribute("value") ) ).click() #the click takes you to a new page
source = self.br.page_source #get the new page source
#now check to see if some required data is on the navigated page, and print some stuff if so
if "There is no summary data available." not in source:
print "the new page is good! Here are the original args: ", option.text, option.get_attribute("value")
#time to go back to the main page and click the next option element
self.br.back()
print "went backwards" #for debugging
So, everything works until the second iteration after the self.br.back()
and the loop starts again. I get an extremely long Selenium error stating:
selenium.common.exceptions.StaleElementReferenceException: Message: u'Element not found in the cache - perhaps the page has changed since it was looked up' ; Stacktrace:
at fxdriver.cache.getElementAt (resource://fxdriver/modules/web_element_cache.js:7643)
at Utils.getElementAt (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:7232)
at WebElement.getElementAttribute (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:10335)
at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:10840)
at DelayedCommand.prototype.executeInternal_ (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:10845)
at DelayedCommand.prototype.execute/< (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:10787)
Clearly the error says the element may no longer exist, but how would that be possible since I’m just iterating over a list of objects that was retrieved during a previous page session…
In any case, how should I go about doing this? Maybe the way I’m trying isn’t the best way…
Answers:
I’m not completely familiar with python, so you may need to rework this slightly. I think this will at the very least get you started.
from selenium.webdriver.support.ui import Select, WebDriverWait
select = self.br.find_element_by_name( field ) #get the select element
options = select.find_elements_by_tag_name("option") #get all the options into a list
optionsList = []
for option in options: #iterate over the options, place attribute value in list
optionsList.append(option.get_attribute("value"))
for optionValue in optionsList:
print "starting loop on option %s" % optionValue
select = Select(self.br.find_element_by_name( field ))
select.select_by_value(optionValue)
source = self.br.page_source #get the new page source
#now check to see if some required data is on the navigated page, and print some stuff if so
if "There is no summary data available." not in source:
print "the new page is good! Here are the original args: ", optionValue
#time to go back to the main page and click the next option element
self.br.back()
print "went backwards" #for debugging
The idea here is to build a list of the option values in the first for loop, then iterate over those option values to navigate to the second page in the second for loop. Use the python Select library to select those option values. I put in a line to get a new reference to the dropdown each time through the second for loop.
I hope this was helpful
I’m using Python2.7
and Selenium
and am trying to use a select box’s option list for the basis of my site spider functionality, lets get right to the code:
select = self.br.find_element_by_name( field ) #get the select element
options = select.find_elements_by_tag_name("option") #get all the options into a list
for option in options: #iterate over the options
print "starting loop on option %s" % option.text
#now get the option with the value that is currently being iterated over and select it from the original select box source
self.br.find_element_by_xpath("//select[@name='%s']/option[@value='%s']" % ( field, option.get_attribute("value") ) ).click() #the click takes you to a new page
source = self.br.page_source #get the new page source
#now check to see if some required data is on the navigated page, and print some stuff if so
if "There is no summary data available." not in source:
print "the new page is good! Here are the original args: ", option.text, option.get_attribute("value")
#time to go back to the main page and click the next option element
self.br.back()
print "went backwards" #for debugging
So, everything works until the second iteration after the self.br.back()
and the loop starts again. I get an extremely long Selenium error stating:
selenium.common.exceptions.StaleElementReferenceException: Message: u'Element not found in the cache - perhaps the page has changed since it was looked up' ; Stacktrace:
at fxdriver.cache.getElementAt (resource://fxdriver/modules/web_element_cache.js:7643)
at Utils.getElementAt (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:7232)
at WebElement.getElementAttribute (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:10335)
at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:10840)
at DelayedCommand.prototype.executeInternal_ (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:10845)
at DelayedCommand.prototype.execute/< (file:///tmp/tmpm_ciQJ/extensions/[email protected]/components/command_processor.js:10787)
Clearly the error says the element may no longer exist, but how would that be possible since I’m just iterating over a list of objects that was retrieved during a previous page session…
In any case, how should I go about doing this? Maybe the way I’m trying isn’t the best way…
I’m not completely familiar with python, so you may need to rework this slightly. I think this will at the very least get you started.
from selenium.webdriver.support.ui import Select, WebDriverWait
select = self.br.find_element_by_name( field ) #get the select element
options = select.find_elements_by_tag_name("option") #get all the options into a list
optionsList = []
for option in options: #iterate over the options, place attribute value in list
optionsList.append(option.get_attribute("value"))
for optionValue in optionsList:
print "starting loop on option %s" % optionValue
select = Select(self.br.find_element_by_name( field ))
select.select_by_value(optionValue)
source = self.br.page_source #get the new page source
#now check to see if some required data is on the navigated page, and print some stuff if so
if "There is no summary data available." not in source:
print "the new page is good! Here are the original args: ", optionValue
#time to go back to the main page and click the next option element
self.br.back()
print "went backwards" #for debugging
The idea here is to build a list of the option values in the first for loop, then iterate over those option values to navigate to the second page in the second for loop. Use the python Select library to select those option values. I put in a line to get a new reference to the dropdown each time through the second for loop.
I hope this was helpful