How to delete blank values from a table useing find_element of Selenium Python

Question:

I got the problem with blank elements. I used to find elements in website selenium and I recived list with of elements but some of them are blank (exactly half).

Website: https://www.cmegroup.com/markets/energy/refined-products/gasoil-01-rotterdam-barges-swap.quotes.html

I’m looking for value from column name "MONTH", rest of columns work correctly (I recived list without empty values).

main = driver.find_element(By.ID, "main-content")
time.sleep(10)
matches = main.find_elements(By.XPATH,
                          '//*[@id="productTabData"]/div/div/div/div/div/div[2]/div/div/div/div/div/div[5]/div/div/div/div[1]/div/table/tbody/tr')
time.sleep(10)
dane = []
for match in matches:
    Date = match.find_element(By.XPATH, "./td[1]/div/span/b").text
    Price = match.find_element(By.XPATH, "./td[5]").text
    Updated = match.find_element(By.XPATH, "./td[10]").text
    print(Date)
    table = {
        "DataPL" : Date,
        "GO" : Price,
        "Updated" : Updated
    }

    dane.append(table)
df=pd.DataFrame(dane)

To solve problem I used .shift method(pandas) but I’m looking better solve

df["DataPL"] = df["DataPL"].shift(-18)
df = df.iloc[0:17,:2]
Asked By: Adii

||

Answers:

To scrape table within the webpage you need to induce WebDriverWait for the visibility_of_element_located() for the <table> element and using DataFrame from Pandas you can use the following Locator Strategy:

  • Code block:

    driver.get('https://www.cmegroup.com/markets/energy/refined-products/gasoil-01-rotterdam-barges-swap.quotes.html')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#productTabData table"))).get_attribute("outerHTML")
    df = pd.read_html(data)
    print(df)
    driver.quit()
    
  • Console Output:

    [            Month Chart Last Change PriorSettle Open High Low Volume                  Updated
                Month Chart Last Change PriorSettle Open High Low Volume                  Updated
    0   FEB 2023AVLG3   NaN    -      -     795.113    -    -   -      0  18:06:09 CT 22 Feb 2023
    1   MAR 2023AVLH3   NaN    -      -     772.830    -    -   -      0  18:04:52 CT 22 Feb 2023
    2   APR 2023AVLJ3   NaN    -      -     767.276    -    -   -      0  18:05:15 CT 22 Feb 2023
    3   MAY 2023AVLK3   NaN    -      -     761.442    -    -   -      0  18:06:17 CT 22 Feb 2023
    4   JUN 2023AVLM3   NaN    -      -     755.341    -    -   -      0  18:05:51 CT 22 Feb 2023
    5   JUL 2023AVLN3   NaN    -      -     753.708    -    -   -      0  18:05:10 CT 22 Feb 2023
    6   AUG 2023AVLQ3   NaN    -      -     752.005    -    -   -      0  18:04:57 CT 22 Feb 2023
    7   SEP 2023AVLU3   NaN    -      -     750.958    -    -   -      0  18:05:30 CT 22 Feb 2023
    8   OCT 2023AVLV3   NaN    -      -     749.811    -    -   -      0  18:06:19 CT 22 Feb 2023
    9   NOV 2023AVLX3   NaN    -      -     743.822    -    -   -      0  18:04:37 CT 22 Feb 2023
    10  DEC 2023AVLZ3   NaN    -      -     739.558    -    -   -      0  18:04:11 CT 22 Feb 2023
    11  JAN 2024AVLF4   NaN    -      -     736.617    -    -   -      0  18:05:42 CT 22 Feb 2023
    12  FEB 2024AVLG4   NaN    -      -     733.250    -    -   -      0  18:05:40 CT 22 Feb 2023
    13  MAR 2024AVLH4   NaN    -      -     729.158    -    -   -      0  18:06:23 CT 22 Feb 2023
    14  APR 2024AVLJ4   NaN    -      -     725.386    -    -   -      0  18:05:50 CT 22 Feb 2023
    15  MAY 2024AVLK4   NaN    -      -     720.620    -    -   -      0  18:05:49 CT 22 Feb 2023
    16  JUN 2024AVLM4   NaN    -      -     717.788    -    -   -      0  18:05:46 CT 22 Feb 2023
    17  JUL 2024AVLN4   NaN    -      -     715.783    -    -   -      0  18:05:27 CT 22 Feb 2023]
    
  • Note: You have to add the following imports :

    import pandas as pd
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
Answered By: undetected Selenium