Scraping data from KendoUI interface using selenium and python
Question:
I am trying to scrape data from this website using selenium and python (note the website is in German but can be translated using Chrome’s translate function). Specifically, I would like to automate the process of (1) selecting "24 hours" from the "averaging" dropdown, (2) selecting "maximum" from the "Period" dropdown, and finally (3) clicking the "export" button and downloading the associated Excel file.
I have limited scraping experience, but when I have done it in the past, I have found and clicked on items using their xpath (i.e., using driver.find_element('xpath', ...).click()
). However, although I’m able to find what seem to be the correct xpaths here, when I try to interact with them, Selenium returns an ElementNotInteractableException
. I would really appreciate any guidance on how to scrape this site. I am open to solution that do not use Selenium.
Answers:
You can download the Json data directly without selenium
, for example into a Pandas dataframe and then save it as csv
/xls
:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.laerm-monitoring.de/mittelung?mp=14'
api_url = 'https://www.laerm-monitoring.de/Mittelung/Chartdata_Read'
headers = {'X-Requested-With' : 'XMLHttpRequest'}
with requests.session() as s:
soup = BeautifulSoup(s.get(url).content, 'html.parser')
token = soup.select_one('[name="__RequestVerificationToken"]')['value']
payload = {
"sort": "",
"group": "",
"filter": "",
"__RequestVerificationToken": token,
"avgtype": "0",
"mpnumber": "14",
"daytime": "2",
"numMonths": "0",
"endDate": "0"
}
df = pd.DataFrame(s.post(api_url, data=payload, headers=headers).json())
print(df.head(20))
df.to_csv('data.csv', index=False) # save it as CSV
Prints:
Date Lr CountnoGZ CountGZ
0 2019-09-20T00:00:00 66.6 16 19
1 2019-09-21T00:00:00 67.8 133 26
2 2019-09-22T00:00:00 69.1 154 25
3 2019-09-23T00:00:00 70.6 163 56
4 2019-09-24T00:00:00 72.3 160 71
5 2019-09-25T00:00:00 71.0 163 67
6 2019-09-26T00:00:00 72.0 154 76
7 2019-09-27T00:00:00 71.2 157 58
8 2019-09-28T00:00:00 68.8 140 33
9 2019-09-29T00:00:00 68.9 155 24
10 2019-09-30T00:00:00 71.9 158 64
11 2019-10-01T00:00:00 72.4 162 64
12 2019-10-02T00:00:00 71.9 150 64
13 2019-10-03T00:00:00 70.5 155 46
14 2019-10-04T00:00:00 70.9 148 58
15 2019-10-05T00:00:00 68.3 132 27
16 2019-10-06T00:00:00 67.2 152 26
17 2019-10-07T00:00:00 71.4 166 63
18 2019-10-08T00:00:00 71.3 167 71
19 2019-10-09T00:00:00 72.1 161 72
and saves data.csv
(screenshot from LibreOffice):
To solve this problem, you can try using the ActionChains class in the selenium.webdriver.common.action_chains module to perform the mouse clicks and keyboard input necessary to select the options and click the button.
Here’s some sample code that should do what you want:
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
# Select the "24 hours" option from the "averaging" dropdown
averaging_dropdown = driver.find_element(By.XPATH, '//*[@id="averaging"]')
averaging_option = driver.find_element(By.XPATH, '//*[@id="averaging"]/option[2]')
ActionChains(driver).move_to_element(averaging_dropdown).click(averaging_option).perform()
# Select the "maximum" option from the "Period" dropdown
period_dropdown = driver.find_element(By.XPATH, '//*[@id="period"]')
period_option = driver.find_element(By.XPATH, '//*[@id="period"]/option[3]')
ActionChains(driver).move_to_element(period_dropdown).click(period_option).perform()
# Click the "export" button
export_button = driver.find_element(By.XPATH, '//*[@id="toolbar"]/button[1]')
ActionChains(driver).move_to_element(export_button).click().perform()
# Wait for the download to complete
driver.implicitly_wait(30)
This code uses the ActionChains class to perform a series of actions in sequence:
Move the mouse cursor to the "averaging" dropdown element and click it to open the dropdown menu.
Move the mouse cursor to the "24 hours" option and click it to select it.
Repeat steps 1 and 2 for the "Period" dropdown and the "maximum" option.
Move the mouse cursor to the "export" button and click it to initiate the download.
Note that you may need to modify the XPath expressions to match the actual element paths on the website. You can use the developer tools in your web browser to inspect the elements and find their XPaths.
I hope it will be of help to you
I am trying to scrape data from this website using selenium and python (note the website is in German but can be translated using Chrome’s translate function). Specifically, I would like to automate the process of (1) selecting "24 hours" from the "averaging" dropdown, (2) selecting "maximum" from the "Period" dropdown, and finally (3) clicking the "export" button and downloading the associated Excel file.
I have limited scraping experience, but when I have done it in the past, I have found and clicked on items using their xpath (i.e., using driver.find_element('xpath', ...).click()
). However, although I’m able to find what seem to be the correct xpaths here, when I try to interact with them, Selenium returns an ElementNotInteractableException
. I would really appreciate any guidance on how to scrape this site. I am open to solution that do not use Selenium.
You can download the Json data directly without selenium
, for example into a Pandas dataframe and then save it as csv
/xls
:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.laerm-monitoring.de/mittelung?mp=14'
api_url = 'https://www.laerm-monitoring.de/Mittelung/Chartdata_Read'
headers = {'X-Requested-With' : 'XMLHttpRequest'}
with requests.session() as s:
soup = BeautifulSoup(s.get(url).content, 'html.parser')
token = soup.select_one('[name="__RequestVerificationToken"]')['value']
payload = {
"sort": "",
"group": "",
"filter": "",
"__RequestVerificationToken": token,
"avgtype": "0",
"mpnumber": "14",
"daytime": "2",
"numMonths": "0",
"endDate": "0"
}
df = pd.DataFrame(s.post(api_url, data=payload, headers=headers).json())
print(df.head(20))
df.to_csv('data.csv', index=False) # save it as CSV
Prints:
Date Lr CountnoGZ CountGZ
0 2019-09-20T00:00:00 66.6 16 19
1 2019-09-21T00:00:00 67.8 133 26
2 2019-09-22T00:00:00 69.1 154 25
3 2019-09-23T00:00:00 70.6 163 56
4 2019-09-24T00:00:00 72.3 160 71
5 2019-09-25T00:00:00 71.0 163 67
6 2019-09-26T00:00:00 72.0 154 76
7 2019-09-27T00:00:00 71.2 157 58
8 2019-09-28T00:00:00 68.8 140 33
9 2019-09-29T00:00:00 68.9 155 24
10 2019-09-30T00:00:00 71.9 158 64
11 2019-10-01T00:00:00 72.4 162 64
12 2019-10-02T00:00:00 71.9 150 64
13 2019-10-03T00:00:00 70.5 155 46
14 2019-10-04T00:00:00 70.9 148 58
15 2019-10-05T00:00:00 68.3 132 27
16 2019-10-06T00:00:00 67.2 152 26
17 2019-10-07T00:00:00 71.4 166 63
18 2019-10-08T00:00:00 71.3 167 71
19 2019-10-09T00:00:00 72.1 161 72
and saves data.csv
(screenshot from LibreOffice):
To solve this problem, you can try using the ActionChains class in the selenium.webdriver.common.action_chains module to perform the mouse clicks and keyboard input necessary to select the options and click the button.
Here’s some sample code that should do what you want:
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
# Select the "24 hours" option from the "averaging" dropdown
averaging_dropdown = driver.find_element(By.XPATH, '//*[@id="averaging"]')
averaging_option = driver.find_element(By.XPATH, '//*[@id="averaging"]/option[2]')
ActionChains(driver).move_to_element(averaging_dropdown).click(averaging_option).perform()
# Select the "maximum" option from the "Period" dropdown
period_dropdown = driver.find_element(By.XPATH, '//*[@id="period"]')
period_option = driver.find_element(By.XPATH, '//*[@id="period"]/option[3]')
ActionChains(driver).move_to_element(period_dropdown).click(period_option).perform()
# Click the "export" button
export_button = driver.find_element(By.XPATH, '//*[@id="toolbar"]/button[1]')
ActionChains(driver).move_to_element(export_button).click().perform()
# Wait for the download to complete
driver.implicitly_wait(30)
This code uses the ActionChains class to perform a series of actions in sequence:
Move the mouse cursor to the "averaging" dropdown element and click it to open the dropdown menu.
Move the mouse cursor to the "24 hours" option and click it to select it.
Repeat steps 1 and 2 for the "Period" dropdown and the "maximum" option.
Move the mouse cursor to the "export" button and click it to initiate the download.
Note that you may need to modify the XPath expressions to match the actual element paths on the website. You can use the developer tools in your web browser to inspect the elements and find their XPaths.
I hope it will be of help to you