Extract site id and value from dropdown on webpage
Question:
I’m trying to get the names that appear in the dropdown and the associated value (as it appears in the html code).
Here’s what I have so far. The result is empty so I know I’m not searching for the section properly.
from bs4 import BeautifulSoup
from requests import get
url = 'http://clima.edu.ar/InformePorPeriodo.aspx'
page = get(url)
soup = BeautifulSoup(page.text)
test = soup.find_all('p', {'class': 'CaptionForm'})
Answers:
The dropdown values are under select name="IdEstacion"
.
So, use find_all('select', {"name": "IdEstacion"})
.
To get all the tags including the value
attributes, see the below example.
I’m using regex since there are a lot of extra whitespaces.
import re
import requests
from bs4 import BeautifulSoup
fmt_string = "{} tt {}"
url = "http://clima.edu.ar/InformePorPeriodo.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
dropdown_values = soup.find_all("select", {"name": "IdEstacion"})
for option in dropdown_values:
options = option.find_all("option")
for option_ in options:
print(fmt_string.format(option_["value"], re.sub(" +", " ", option_.text)))
Prints:
58 Aeropuerto San Luis (REM )
59 Aeropuerto Valle del Conlara (REM )
88 AgroZAL (REM )
1 Alto Pelado (REM )
2 Anchorena (REM )
3 Bajada Nueva (REM )
74 Balde de Quines (TEST1 )
4 Baldecito (REM )
5 Batavia (REM )
6 Beazley (REM )
7 Buena Esperanza (REM )
8 Concarán (REM )
49 Coronel Alzogaray (REM )
9 Desaguadero (REM )
77 Dique La Huertita (SLA )
78 Dique Las Palmeras (SLA )
76 Dique Luján (SLA )
85 Donovan (REM )
10 El Amago (REM )
65 El Arenal (SLA )
11 El Durazno (REM )
47 El Trapiche (REM )
133 Est Nueva Version (TEST )
55 Estancia Grande (REM )
68 Estancia Samay-Huasi (SLA )
12 Fraga (REM )
67 Frías (SLA )
13 Justo Daract (REM )
14 La Angelina (REM )
50 La Botija (REM )
72 La Brea (TEST )
15 La Calera (REM )
84 La Candelaria (PRONO )
71 La Cañada (SLA )
83 La Carolina (PRONO )
16 La Cumbre (REM )
17 La Esquina (REM )
66 La Estancia (SLA )
87 La Florida - Dique (REM )
18 La Florida (REM1 )
70 La Porota (SLA )
19 La Punilla (REM )
20 La Punta (REM )
21 La Toma (REM )
22 La Tranca (REM )
23 Lafinur (REM )
61 Laguna Larga (SLA )
56 Las Chacras (REM )
62 Las Chacras(San Martín) (SLA )
54 Los Coros (REM )
73 Los Ruartes (TEST )
81 Luján (PRONO )
25 Luján (REM )
26 Martín de Loyola (REM )
27 Merlo (REM )
48 Merlo Alto (REM )
69 Mesilla del Cura (SLA )
28 Naschel (REM )
44 Navia (REM )
29 Nogolí (REM )
30 Nueva Galia (REM )
31 Paso Grande (REM )
51 Potrero de los Funes (REM )
75 Pozo La Porota (SLA )
53 Quebrada de las Higueritas (REM )
82 Quines (PRONO )
79 Río San Francisco (SLA )
32 San Francisco (REM )
46 San Luis Rural (REM )
34 San Martín (REM )
64 San Martín (TEST )
36 San Miguel (REM )
35 Santa Rosa (REM )
52 Soven (REM )
63 Tala Verde (SLA )
37 Tilisarao (REM )
38 Unión (REM )
45 Valle de Pancanta (REM )
86 Varela (REM )
39 Villa de Praga (REM )
40 Villa Gral. Roca (REM )
41 Villa Larca (REM )
80 Villa Mercedes - INTA (TEST )
42 Villa Mercedes (REM )
60 Villa Reynolds (REM )
43 Zanjitas (REM )
I’m trying to get the names that appear in the dropdown and the associated value (as it appears in the html code).
Here’s what I have so far. The result is empty so I know I’m not searching for the section properly.
from bs4 import BeautifulSoup
from requests import get
url = 'http://clima.edu.ar/InformePorPeriodo.aspx'
page = get(url)
soup = BeautifulSoup(page.text)
test = soup.find_all('p', {'class': 'CaptionForm'})
The dropdown values are under select name="IdEstacion"
.
So, use find_all('select', {"name": "IdEstacion"})
.
To get all the tags including the value
attributes, see the below example.
I’m using regex since there are a lot of extra whitespaces.
import re
import requests
from bs4 import BeautifulSoup
fmt_string = "{} tt {}"
url = "http://clima.edu.ar/InformePorPeriodo.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
dropdown_values = soup.find_all("select", {"name": "IdEstacion"})
for option in dropdown_values:
options = option.find_all("option")
for option_ in options:
print(fmt_string.format(option_["value"], re.sub(" +", " ", option_.text)))
Prints:
58 Aeropuerto San Luis (REM )
59 Aeropuerto Valle del Conlara (REM )
88 AgroZAL (REM )
1 Alto Pelado (REM )
2 Anchorena (REM )
3 Bajada Nueva (REM )
74 Balde de Quines (TEST1 )
4 Baldecito (REM )
5 Batavia (REM )
6 Beazley (REM )
7 Buena Esperanza (REM )
8 Concarán (REM )
49 Coronel Alzogaray (REM )
9 Desaguadero (REM )
77 Dique La Huertita (SLA )
78 Dique Las Palmeras (SLA )
76 Dique Luján (SLA )
85 Donovan (REM )
10 El Amago (REM )
65 El Arenal (SLA )
11 El Durazno (REM )
47 El Trapiche (REM )
133 Est Nueva Version (TEST )
55 Estancia Grande (REM )
68 Estancia Samay-Huasi (SLA )
12 Fraga (REM )
67 Frías (SLA )
13 Justo Daract (REM )
14 La Angelina (REM )
50 La Botija (REM )
72 La Brea (TEST )
15 La Calera (REM )
84 La Candelaria (PRONO )
71 La Cañada (SLA )
83 La Carolina (PRONO )
16 La Cumbre (REM )
17 La Esquina (REM )
66 La Estancia (SLA )
87 La Florida - Dique (REM )
18 La Florida (REM1 )
70 La Porota (SLA )
19 La Punilla (REM )
20 La Punta (REM )
21 La Toma (REM )
22 La Tranca (REM )
23 Lafinur (REM )
61 Laguna Larga (SLA )
56 Las Chacras (REM )
62 Las Chacras(San Martín) (SLA )
54 Los Coros (REM )
73 Los Ruartes (TEST )
81 Luján (PRONO )
25 Luján (REM )
26 Martín de Loyola (REM )
27 Merlo (REM )
48 Merlo Alto (REM )
69 Mesilla del Cura (SLA )
28 Naschel (REM )
44 Navia (REM )
29 Nogolí (REM )
30 Nueva Galia (REM )
31 Paso Grande (REM )
51 Potrero de los Funes (REM )
75 Pozo La Porota (SLA )
53 Quebrada de las Higueritas (REM )
82 Quines (PRONO )
79 Río San Francisco (SLA )
32 San Francisco (REM )
46 San Luis Rural (REM )
34 San Martín (REM )
64 San Martín (TEST )
36 San Miguel (REM )
35 Santa Rosa (REM )
52 Soven (REM )
63 Tala Verde (SLA )
37 Tilisarao (REM )
38 Unión (REM )
45 Valle de Pancanta (REM )
86 Varela (REM )
39 Villa de Praga (REM )
40 Villa Gral. Roca (REM )
41 Villa Larca (REM )
80 Villa Mercedes - INTA (TEST )
42 Villa Mercedes (REM )
60 Villa Reynolds (REM )
43 Zanjitas (REM )