Correctly refer to a button to scrape aspx webpage using requests
Question:
I’m trying to extract data from an aspx site. Trying to find the name of the submit button to include in the post request, but it doesn’t appear when I inspect the page.
import csv
import requests
form_data = {
'CalendarioDesde': '01/11/2022',
'CalendarioHasta': '28/12/2022',
'IdEstacion': 'Aeropuerto San Luis (REM)',
'': 'Consultar' # trying to get the button name
}
response = requests.post('http://clima.edu.ar/InformePorPeriodo.aspx', data=form_data)
reader = csv.reader(response.text.splitlines())
Based on the posts I’ve seen, it’s standard to use beautiful soup for scraping aspx pages, but I wasn’t able to find a tutorial on the complex parameters (i.e. VIEWSTATE). So I’m starting with this basic requests version.
If this isn’t the correct method, please suggest a post / resource that I can use as a template to extract data.
Answers:
Yesterday, I answered a question on how to get data loaded dynamically. See:
Scrape table from JSP website using Python
To get the CSV data you can try:
import requests
import csv
params = {
'tipo': 'Periodo',
'Estacion': '42',
'fechaDesde': '20221101', # --> Same as 2022-11-01
'fechahasta': '20221229', # Same as 2022/12/29
}
response = requests.get('http://clima.edu.ar/ObtenerCsv.aspx', params=params)
reader = csv.reader(response.text.splitlines())
for row in reader:
print(row)
Prints (Truncated):
['Fecha/Hora;"Precipitacion (mm)";"Temperatura (ºC)";"Humedad (%)";"Dir. Del Viento (º)";"Int. del Viento (m/s)";"Radiación (w/m2)"']
['29/12/2022 20:00:00;0', '00;28', '20;16', '80;128', '60;0', '20;11', '70']
['29/12/2022 19:00:00;0', '00;32', '40;11', '10;194', '20;0', '40;181', '50']
['29/12/2022 18:00:00;0', '00;33', '00;10', '70;228', '00;1', '40;413', '60']
['29/12/2022 17:00:00;0', '00;33', '40;10', '50;223', '70;2', '20;647', '60']
['29/12/2022 16:00:00;0', '00;33', '30;10', '90;235', '40;2', '70;841', '00']
You can inspect your browser’s Network calls to view the requests by pressing the F12 key.
I’m trying to extract data from an aspx site. Trying to find the name of the submit button to include in the post request, but it doesn’t appear when I inspect the page.
import csv
import requests
form_data = {
'CalendarioDesde': '01/11/2022',
'CalendarioHasta': '28/12/2022',
'IdEstacion': 'Aeropuerto San Luis (REM)',
'': 'Consultar' # trying to get the button name
}
response = requests.post('http://clima.edu.ar/InformePorPeriodo.aspx', data=form_data)
reader = csv.reader(response.text.splitlines())
Based on the posts I’ve seen, it’s standard to use beautiful soup for scraping aspx pages, but I wasn’t able to find a tutorial on the complex parameters (i.e. VIEWSTATE). So I’m starting with this basic requests version.
If this isn’t the correct method, please suggest a post / resource that I can use as a template to extract data.
Yesterday, I answered a question on how to get data loaded dynamically. See:
Scrape table from JSP website using Python
To get the CSV data you can try:
import requests
import csv
params = {
'tipo': 'Periodo',
'Estacion': '42',
'fechaDesde': '20221101', # --> Same as 2022-11-01
'fechahasta': '20221229', # Same as 2022/12/29
}
response = requests.get('http://clima.edu.ar/ObtenerCsv.aspx', params=params)
reader = csv.reader(response.text.splitlines())
for row in reader:
print(row)
Prints (Truncated):
['Fecha/Hora;"Precipitacion (mm)";"Temperatura (ºC)";"Humedad (%)";"Dir. Del Viento (º)";"Int. del Viento (m/s)";"Radiación (w/m2)"']
['29/12/2022 20:00:00;0', '00;28', '20;16', '80;128', '60;0', '20;11', '70']
['29/12/2022 19:00:00;0', '00;32', '40;11', '10;194', '20;0', '40;181', '50']
['29/12/2022 18:00:00;0', '00;33', '00;10', '70;228', '00;1', '40;413', '60']
['29/12/2022 17:00:00;0', '00;33', '40;10', '50;223', '70;2', '20;647', '60']
['29/12/2022 16:00:00;0', '00;33', '30;10', '90;235', '40;2', '70;841', '00']
You can inspect your browser’s Network calls to view the requests by pressing the F12 key.