Extract site id and value from dropdown on webpage

Question:

I’m trying to get the names that appear in the dropdown and the associated value (as it appears in the html code).

Here’s what I have so far. The result is empty so I know I’m not searching for the section properly.

from bs4 import BeautifulSoup
from requests import get

url = 'http://clima.edu.ar/InformePorPeriodo.aspx'

page = get(url)
soup = BeautifulSoup(page.text)

test = soup.find_all('p', {'class': 'CaptionForm'})

The correct section starts here:
enter image description here

Asked By: matsuo_basho

||

Answers:

The dropdown values are under select name="IdEstacion".

So, use find_all('select', {"name": "IdEstacion"}).

To get all the tags including the value attributes, see the below example.

I’m using regex since there are a lot of extra whitespaces.

import re
import requests
from bs4 import BeautifulSoup


fmt_string = "{} tt {}"
url = "http://clima.edu.ar/InformePorPeriodo.aspx"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

dropdown_values = soup.find_all("select", {"name": "IdEstacion"})
for option in dropdown_values:
    options = option.find_all("option")

    for option_ in options:
        print(fmt_string.format(option_["value"], re.sub(" +", " ", option_.text)))

Prints:

58       Aeropuerto San Luis (REM )
59       Aeropuerto Valle del Conlara (REM )
88       AgroZAL (REM )
1        Alto Pelado (REM )
2        Anchorena (REM )
3        Bajada Nueva (REM )
74       Balde de Quines (TEST1 )
4        Baldecito (REM )
5        Batavia (REM )
6        Beazley (REM )
7        Buena Esperanza (REM )
8        Concarán (REM )
49       Coronel Alzogaray (REM )
9        Desaguadero (REM )
77       Dique La Huertita (SLA )
78       Dique Las Palmeras (SLA )
76       Dique Luján (SLA )
85       Donovan (REM )
10       El Amago (REM )
65       El Arenal (SLA )
11       El Durazno (REM )
47       El Trapiche (REM )
133          Est Nueva Version (TEST )
55       Estancia Grande (REM )
68       Estancia Samay-Huasi (SLA )
12       Fraga (REM )
67       Frías (SLA )
13       Justo Daract (REM )
14       La Angelina (REM )
50       La Botija (REM )
72       La Brea (TEST )
15       La Calera (REM )
84       La Candelaria (PRONO )
71       La Cañada (SLA )
83       La Carolina (PRONO )
16       La Cumbre (REM )
17       La Esquina (REM )
66       La Estancia (SLA )
87       La Florida - Dique (REM )
18       La Florida (REM1 )
70       La Porota (SLA )
19       La Punilla (REM )
20       La Punta (REM )
21       La Toma (REM )
22       La Tranca (REM )
23       Lafinur (REM )
61       Laguna Larga (SLA )
56       Las Chacras (REM )
62       Las Chacras(San Martín) (SLA )
54       Los Coros (REM )
73       Los Ruartes (TEST )
81       Luján (PRONO )
25       Luján (REM )
26       Martín de Loyola (REM )
27       Merlo (REM )
48       Merlo Alto (REM )
69       Mesilla del Cura (SLA )
28       Naschel (REM )
44       Navia (REM )
29       Nogolí (REM )
30       Nueva Galia (REM )
31       Paso Grande (REM )
51       Potrero de los Funes (REM )
75       Pozo La Porota (SLA )
53       Quebrada de las Higueritas (REM )
82       Quines (PRONO )
79       Río San Francisco (SLA )
32       San Francisco (REM )
46       San Luis Rural (REM )
34       San Martín (REM )
64       San Martín (TEST )
36       San Miguel (REM )
35       Santa Rosa (REM )
52       Soven (REM )
63       Tala Verde (SLA )
37       Tilisarao (REM )
38       Unión (REM )
45       Valle de Pancanta (REM )
86       Varela (REM )
39       Villa de Praga (REM )
40       Villa Gral. Roca (REM )
41       Villa Larca (REM )
80       Villa Mercedes - INTA (TEST )
42       Villa Mercedes (REM )
60       Villa Reynolds (REM )
43       Zanjitas (REM )
Answered By: MendelG
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.