Scraping data from KendoUI interface using selenium and python

Question

I am trying to scrape data from this website using selenium and python (note the website is in German but can be translated using Chrome’s translate function). Specifically, I would like to automate the process of (1) selecting "24 hours" from the "averaging" dropdown, (2) selecting "maximum" from the "Period" dropdown, and finally (3) clicking the "export" button and downloading the associated Excel file.

I have limited scraping experience, but when I have done it in the past, I have found and clicked on items using their xpath (i.e., using driver.find_element('xpath', ...).click()). However, although I’m able to find what seem to be the correct xpaths here, when I try to interact with them, Selenium returns an ElementNotInteractableException. I would really appreciate any guidance on how to scrape this site. I am open to solution that do not use Selenium.

Asked By: sdg

||

Source

Answer 1

You can download the Json data directly without selenium, for example into a Pandas dataframe and then save it as csv/xls:

import pandas as pd
import requests
from bs4 import BeautifulSoup


url = 'https://www.laerm-monitoring.de/mittelung?mp=14'
api_url = 'https://www.laerm-monitoring.de/Mittelung/Chartdata_Read'


headers = {'X-Requested-With' : 'XMLHttpRequest'}

with requests.session() as s:
    soup = BeautifulSoup(s.get(url).content, 'html.parser')
    token = soup.select_one('[name="__RequestVerificationToken"]')['value']

    payload = {
        "sort": "",
        "group": "",
        "filter": "",
        "__RequestVerificationToken": token,
        "avgtype": "0",
        "mpnumber": "14",
        "daytime": "2",
        "numMonths": "0",
        "endDate": "0"
    }

    df = pd.DataFrame(s.post(api_url, data=payload, headers=headers).json())
    print(df.head(20))
    df.to_csv('data.csv', index=False)  # save it as CSV

Prints:

                   Date    Lr  CountnoGZ  CountGZ
0   2019-09-20T00:00:00  66.6         16       19
1   2019-09-21T00:00:00  67.8        133       26
2   2019-09-22T00:00:00  69.1        154       25
3   2019-09-23T00:00:00  70.6        163       56
4   2019-09-24T00:00:00  72.3        160       71
5   2019-09-25T00:00:00  71.0        163       67
6   2019-09-26T00:00:00  72.0        154       76
7   2019-09-27T00:00:00  71.2        157       58
8   2019-09-28T00:00:00  68.8        140       33
9   2019-09-29T00:00:00  68.9        155       24
10  2019-09-30T00:00:00  71.9        158       64
11  2019-10-01T00:00:00  72.4        162       64
12  2019-10-02T00:00:00  71.9        150       64
13  2019-10-03T00:00:00  70.5        155       46
14  2019-10-04T00:00:00  70.9        148       58
15  2019-10-05T00:00:00  68.3        132       27
16  2019-10-06T00:00:00  67.2        152       26
17  2019-10-07T00:00:00  71.4        166       63
18  2019-10-08T00:00:00  71.3        167       71
19  2019-10-09T00:00:00  72.1        161       72

and saves data.csv (screenshot from LibreOffice):

Answered By: Andrej Kesely

Answer 2

To solve this problem, you can try using the ActionChains class in the selenium.webdriver.common.action_chains module to perform the mouse clicks and keyboard input necessary to select the options and click the button.

Here’s some sample code that should do what you want:

 from selenium.webdriver.common.by import By
 from selenium.webdriver.common.keys import Keys
 from selenium.webdriver.common.action_chains import ActionChains

 # Select the "24 hours" option from the "averaging" dropdown
 averaging_dropdown = driver.find_element(By.XPATH, '//*[@id="averaging"]')
 averaging_option = driver.find_element(By.XPATH, '//*[@id="averaging"]/option[2]')
 ActionChains(driver).move_to_element(averaging_dropdown).click(averaging_option).perform()

 # Select the "maximum" option from the "Period" dropdown
 period_dropdown = driver.find_element(By.XPATH, '//*[@id="period"]')
 period_option = driver.find_element(By.XPATH, '//*[@id="period"]/option[3]')
 ActionChains(driver).move_to_element(period_dropdown).click(period_option).perform()

 # Click the "export" button
 export_button = driver.find_element(By.XPATH, '//*[@id="toolbar"]/button[1]')
 ActionChains(driver).move_to_element(export_button).click().perform()

 # Wait for the download to complete
 driver.implicitly_wait(30)

This code uses the ActionChains class to perform a series of actions in sequence:

Move the mouse cursor to the "averaging" dropdown element and click it to open the dropdown menu.
Move the mouse cursor to the "24 hours" option and click it to select it.
Repeat steps 1 and 2 for the "Period" dropdown and the "maximum" option.
Move the mouse cursor to the "export" button and click it to initiate the download.

Note that you may need to modify the XPath expressions to match the actual element paths on the website. You can use the developer tools in your web browser to inspect the elements and find their XPaths.
I hope it will be of help to you

Answered By: Odwori

Scraping data from KendoUI interface using selenium and python

Question:

Answers: