XPath works in the Chrome console, but it does not work in Selenium
Question:
Here is the screenshot of the HTML structure for the page I am trying to scrape.
You can see that there is a <table>
element with class="waffle"
. When I use the XPath specification //table[@class='waffle']
in the Chrome console, it works as expected:
However, when I use the same path on Selenium, it doesn’t work.
container_xpath = "//table[@class='waffle']"
# wait
try:
wait = WebDriverWait(driver, 30)
container = wait.until(EC.presence_of_element_located((By.XPATH, container_xpath)))
print('container found')
except Exception as e:
print('container not found')
raise PageDidNotLoadError
return
The Python script prints "container not found".
What is wrong with Selenium?
Answers:
<iframe style="border-width: 2px; border-style: solid; border-color: red; width: 1000px; height: 200000px;" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQT3Q9qDbZUpnP3_WH2I5qw8O-U_PqXVhhoIzH2o-tSzeDND9FTuoGKbZiNHTbrzTgKAUA2_SvXFh_2/pubhtml?gid=159569114&single=true&widget=true&headers=false&gid=0&range=A:F" width="320" height="240"></iframe>
<iframe id="pageswitcher-content" frameborder="0" marginheight="0" marginwidth="0" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQT3Q9qDbZUpnP3_WH2I5qw8O-U_PqXVhhoIzH2o-tSzeDND9FTuoGKbZiNHTbrzTgKAUA2_SvXFh_2/pubhtml/sheet?headers=false&gid=159569114&range=A:F" style="display: block; width: 100%; height: 100%;"></iframe>
You need to switch to the inner iframe after switching to the outer one.
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#pageswitcher-content")))
Imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
It’s a common practice to hide the elements under a nested iframe. You need to switch to the outer iframe first and then to the inner frame.
The below code should work for you:
# Switch to outer iframe
oframe = driver.find_element(By.CSS_SELECTOR, 'iframe')
driver.switch_to.frame(oframe)
# Switch to nested frame
iframe = driver.find_element(By.CSS_SELECTOR, 'iframe#pageswitcher-content')
driver.switch_to.frame(iframe)
# Get the container
container = wait.until(EC.presence_of_element_located((By.XPATH, container_xpath)))
To get the same in a table form, you can do:
import pandas as pd
table = pd.read_html(container.get_attribute('outerHTML'))
Unnamed: 0
Unnamed: 1
Unnamed: 2
Unnamed: 3
Unnamed: 4
Unnamed: 5
Unnamed: 6
0
1
カード名
仕様
レア
型番
タイプ
状態A
1
nan
nan
nan
nan
nan
nan
nan
2
2
nan
nan
nan
nan
nan
nan
3
3
【スペシャルアート(TAG TEAM GX)】
nan
nan
nan
nan
nan
4
4
フシギバナ&ツタージャGX
SA
SR
066/064
草
3300
5
5
セレビィ&フシギバナGX
SA
SR
097/095
草
3500
6
6
モクロー&アローラナッシーGX
SA
SR
056/054
草
3300
7
7
フェローチェ&マッシブーンGX
SA
SR
056/054
草
2300
8
8
レシラム&リザードンGX
SA
SR
097/095
炎
20000
9
9
リザードン&テールナーGX
SA
SR
068/064
炎
6000
10
10
カメックス&ポッチャマGX
SA
SR
070/064
水
5000
11
11
コイキング&ホエルオーGX
SA
SR
099/095
水
5500
12
12
ヤドン&コダックGX
SA
SR
096/094
水
4000
13
13
ピカチュウ&ゼクロムGX
SA
SR
101/095
雷
30000
14
14
ライチュウ&アローラライチュウGX
SA
SR
057/054
雷
5500
Here is the screenshot of the HTML structure for the page I am trying to scrape.
You can see that there is a <table>
element with class="waffle"
. When I use the XPath specification //table[@class='waffle']
in the Chrome console, it works as expected:
However, when I use the same path on Selenium, it doesn’t work.
container_xpath = "//table[@class='waffle']"
# wait
try:
wait = WebDriverWait(driver, 30)
container = wait.until(EC.presence_of_element_located((By.XPATH, container_xpath)))
print('container found')
except Exception as e:
print('container not found')
raise PageDidNotLoadError
return
The Python script prints "container not found".
What is wrong with Selenium?
<iframe style="border-width: 2px; border-style: solid; border-color: red; width: 1000px; height: 200000px;" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQT3Q9qDbZUpnP3_WH2I5qw8O-U_PqXVhhoIzH2o-tSzeDND9FTuoGKbZiNHTbrzTgKAUA2_SvXFh_2/pubhtml?gid=159569114&single=true&widget=true&headers=false&gid=0&range=A:F" width="320" height="240"></iframe>
<iframe id="pageswitcher-content" frameborder="0" marginheight="0" marginwidth="0" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQT3Q9qDbZUpnP3_WH2I5qw8O-U_PqXVhhoIzH2o-tSzeDND9FTuoGKbZiNHTbrzTgKAUA2_SvXFh_2/pubhtml/sheet?headers=false&gid=159569114&range=A:F" style="display: block; width: 100%; height: 100%;"></iframe>
You need to switch to the inner iframe after switching to the outer one.
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#pageswitcher-content")))
Imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
It’s a common practice to hide the elements under a nested iframe. You need to switch to the outer iframe first and then to the inner frame.
The below code should work for you:
# Switch to outer iframe
oframe = driver.find_element(By.CSS_SELECTOR, 'iframe')
driver.switch_to.frame(oframe)
# Switch to nested frame
iframe = driver.find_element(By.CSS_SELECTOR, 'iframe#pageswitcher-content')
driver.switch_to.frame(iframe)
# Get the container
container = wait.until(EC.presence_of_element_located((By.XPATH, container_xpath)))
To get the same in a table form, you can do:
import pandas as pd
table = pd.read_html(container.get_attribute('outerHTML'))
Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | |
---|---|---|---|---|---|---|---|
0 | 1 | カード名 | 仕様 | レア | 型番 | タイプ | 状態A |
1 | nan | nan | nan | nan | nan | nan | nan |
2 | 2 | nan | nan | nan | nan | nan | nan |
3 | 3 | 【スペシャルアート(TAG TEAM GX)】 | nan | nan | nan | nan | nan |
4 | 4 | フシギバナ&ツタージャGX | SA | SR | 066/064 | 草 | 3300 |
5 | 5 | セレビィ&フシギバナGX | SA | SR | 097/095 | 草 | 3500 |
6 | 6 | モクロー&アローラナッシーGX | SA | SR | 056/054 | 草 | 3300 |
7 | 7 | フェローチェ&マッシブーンGX | SA | SR | 056/054 | 草 | 2300 |
8 | 8 | レシラム&リザードンGX | SA | SR | 097/095 | 炎 | 20000 |
9 | 9 | リザードン&テールナーGX | SA | SR | 068/064 | 炎 | 6000 |
10 | 10 | カメックス&ポッチャマGX | SA | SR | 070/064 | 水 | 5000 |
11 | 11 | コイキング&ホエルオーGX | SA | SR | 099/095 | 水 | 5500 |
12 | 12 | ヤドン&コダックGX | SA | SR | 096/094 | 水 | 4000 |
13 | 13 | ピカチュウ&ゼクロムGX | SA | SR | 101/095 | 雷 | 30000 |
14 | 14 | ライチュウ&アローラライチュウGX | SA | SR | 057/054 | 雷 | 5500 |