How to delete blank values from a table useing find_element of Selenium Python
Question:
I got the problem with blank elements. I used to find elements in website selenium and I recived list with of elements but some of them are blank (exactly half).
Website: https://www.cmegroup.com/markets/energy/refined-products/gasoil-01-rotterdam-barges-swap.quotes.html
I’m looking for value from column name "MONTH", rest of columns work correctly (I recived list without empty values).
main = driver.find_element(By.ID, "main-content")
time.sleep(10)
matches = main.find_elements(By.XPATH,
'//*[@id="productTabData"]/div/div/div/div/div/div[2]/div/div/div/div/div/div[5]/div/div/div/div[1]/div/table/tbody/tr')
time.sleep(10)
dane = []
for match in matches:
Date = match.find_element(By.XPATH, "./td[1]/div/span/b").text
Price = match.find_element(By.XPATH, "./td[5]").text
Updated = match.find_element(By.XPATH, "./td[10]").text
print(Date)
table = {
"DataPL" : Date,
"GO" : Price,
"Updated" : Updated
}
dane.append(table)
df=pd.DataFrame(dane)
To solve problem I used .shift method(pandas) but I’m looking better solve
df["DataPL"] = df["DataPL"].shift(-18)
df = df.iloc[0:17,:2]
Answers:
To scrape table within the webpage you need to induce WebDriverWait for the visibility_of_element_located() for the <table>
element and using DataFrame from Pandas you can use the following Locator Strategy:
-
Code block:
driver.get('https://www.cmegroup.com/markets/energy/refined-products/gasoil-01-rotterdam-barges-swap.quotes.html')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#productTabData table"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
driver.quit()
-
Console Output:
[ Month Chart Last Change PriorSettle Open High Low Volume Updated
Month Chart Last Change PriorSettle Open High Low Volume Updated
0 FEB 2023AVLG3 NaN - - 795.113 - - - 0 18:06:09 CT 22 Feb 2023
1 MAR 2023AVLH3 NaN - - 772.830 - - - 0 18:04:52 CT 22 Feb 2023
2 APR 2023AVLJ3 NaN - - 767.276 - - - 0 18:05:15 CT 22 Feb 2023
3 MAY 2023AVLK3 NaN - - 761.442 - - - 0 18:06:17 CT 22 Feb 2023
4 JUN 2023AVLM3 NaN - - 755.341 - - - 0 18:05:51 CT 22 Feb 2023
5 JUL 2023AVLN3 NaN - - 753.708 - - - 0 18:05:10 CT 22 Feb 2023
6 AUG 2023AVLQ3 NaN - - 752.005 - - - 0 18:04:57 CT 22 Feb 2023
7 SEP 2023AVLU3 NaN - - 750.958 - - - 0 18:05:30 CT 22 Feb 2023
8 OCT 2023AVLV3 NaN - - 749.811 - - - 0 18:06:19 CT 22 Feb 2023
9 NOV 2023AVLX3 NaN - - 743.822 - - - 0 18:04:37 CT 22 Feb 2023
10 DEC 2023AVLZ3 NaN - - 739.558 - - - 0 18:04:11 CT 22 Feb 2023
11 JAN 2024AVLF4 NaN - - 736.617 - - - 0 18:05:42 CT 22 Feb 2023
12 FEB 2024AVLG4 NaN - - 733.250 - - - 0 18:05:40 CT 22 Feb 2023
13 MAR 2024AVLH4 NaN - - 729.158 - - - 0 18:06:23 CT 22 Feb 2023
14 APR 2024AVLJ4 NaN - - 725.386 - - - 0 18:05:50 CT 22 Feb 2023
15 MAY 2024AVLK4 NaN - - 720.620 - - - 0 18:05:49 CT 22 Feb 2023
16 JUN 2024AVLM4 NaN - - 717.788 - - - 0 18:05:46 CT 22 Feb 2023
17 JUL 2024AVLN4 NaN - - 715.783 - - - 0 18:05:27 CT 22 Feb 2023]
-
Note: You have to add the following imports :
import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I got the problem with blank elements. I used to find elements in website selenium and I recived list with of elements but some of them are blank (exactly half).
Website: https://www.cmegroup.com/markets/energy/refined-products/gasoil-01-rotterdam-barges-swap.quotes.html
I’m looking for value from column name "MONTH", rest of columns work correctly (I recived list without empty values).
main = driver.find_element(By.ID, "main-content")
time.sleep(10)
matches = main.find_elements(By.XPATH,
'//*[@id="productTabData"]/div/div/div/div/div/div[2]/div/div/div/div/div/div[5]/div/div/div/div[1]/div/table/tbody/tr')
time.sleep(10)
dane = []
for match in matches:
Date = match.find_element(By.XPATH, "./td[1]/div/span/b").text
Price = match.find_element(By.XPATH, "./td[5]").text
Updated = match.find_element(By.XPATH, "./td[10]").text
print(Date)
table = {
"DataPL" : Date,
"GO" : Price,
"Updated" : Updated
}
dane.append(table)
df=pd.DataFrame(dane)
To solve problem I used .shift method(pandas) but I’m looking better solve
df["DataPL"] = df["DataPL"].shift(-18)
df = df.iloc[0:17,:2]
To scrape table within the webpage you need to induce WebDriverWait for the visibility_of_element_located() for the <table>
element and using DataFrame from Pandas you can use the following Locator Strategy:
-
Code block:
driver.get('https://www.cmegroup.com/markets/energy/refined-products/gasoil-01-rotterdam-barges-swap.quotes.html') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click() data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#productTabData table"))).get_attribute("outerHTML") df = pd.read_html(data) print(df) driver.quit()
-
Console Output:
[ Month Chart Last Change PriorSettle Open High Low Volume Updated Month Chart Last Change PriorSettle Open High Low Volume Updated 0 FEB 2023AVLG3 NaN - - 795.113 - - - 0 18:06:09 CT 22 Feb 2023 1 MAR 2023AVLH3 NaN - - 772.830 - - - 0 18:04:52 CT 22 Feb 2023 2 APR 2023AVLJ3 NaN - - 767.276 - - - 0 18:05:15 CT 22 Feb 2023 3 MAY 2023AVLK3 NaN - - 761.442 - - - 0 18:06:17 CT 22 Feb 2023 4 JUN 2023AVLM3 NaN - - 755.341 - - - 0 18:05:51 CT 22 Feb 2023 5 JUL 2023AVLN3 NaN - - 753.708 - - - 0 18:05:10 CT 22 Feb 2023 6 AUG 2023AVLQ3 NaN - - 752.005 - - - 0 18:04:57 CT 22 Feb 2023 7 SEP 2023AVLU3 NaN - - 750.958 - - - 0 18:05:30 CT 22 Feb 2023 8 OCT 2023AVLV3 NaN - - 749.811 - - - 0 18:06:19 CT 22 Feb 2023 9 NOV 2023AVLX3 NaN - - 743.822 - - - 0 18:04:37 CT 22 Feb 2023 10 DEC 2023AVLZ3 NaN - - 739.558 - - - 0 18:04:11 CT 22 Feb 2023 11 JAN 2024AVLF4 NaN - - 736.617 - - - 0 18:05:42 CT 22 Feb 2023 12 FEB 2024AVLG4 NaN - - 733.250 - - - 0 18:05:40 CT 22 Feb 2023 13 MAR 2024AVLH4 NaN - - 729.158 - - - 0 18:06:23 CT 22 Feb 2023 14 APR 2024AVLJ4 NaN - - 725.386 - - - 0 18:05:50 CT 22 Feb 2023 15 MAY 2024AVLK4 NaN - - 720.620 - - - 0 18:05:49 CT 22 Feb 2023 16 JUN 2024AVLM4 NaN - - 717.788 - - - 0 18:05:46 CT 22 Feb 2023 17 JUL 2024AVLN4 NaN - - 715.783 - - - 0 18:05:27 CT 22 Feb 2023]
-
Note: You have to add the following imports :
import pandas as pd from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC