Extract site id and value from dropdown on webpage

Question

I’m trying to get the names that appear in the dropdown and the associated value (as it appears in the html code).

Here’s what I have so far. The result is empty so I know I’m not searching for the section properly.

from bs4 import BeautifulSoup
from requests import get

url = 'http://clima.edu.ar/InformePorPeriodo.aspx'

page = get(url)
soup = BeautifulSoup(page.text)

test = soup.find_all('p', {'class': 'CaptionForm'})

The correct section starts here:

Asked By: matsuo_basho

||

Source

Answer 1

The dropdown values are under select name="IdEstacion".

So, use find_all('select', {"name": "IdEstacion"}).

To get all the tags including the value attributes, see the below example.

I’m using regex since there are a lot of extra whitespaces.

import re
import requests
from bs4 import BeautifulSoup


fmt_string = "{} tt {}"
url = "http://clima.edu.ar/InformePorPeriodo.aspx"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

dropdown_values = soup.find_all("select", {"name": "IdEstacion"})
for option in dropdown_values:
    options = option.find_all("option")

    for option_ in options:
        print(fmt_string.format(option_["value"], re.sub(" +", " ", option_.text)))

Prints:

58       Aeropuerto San Luis (REM )
59       Aeropuerto Valle del Conlara (REM )
88       AgroZAL (REM )
1        Alto Pelado (REM )
2        Anchorena (REM )
3        Bajada Nueva (REM )
74       Balde de Quines (TEST1 )
4        Baldecito (REM )
5        Batavia (REM )
6        Beazley (REM )
7        Buena Esperanza (REM )
8        Concarán (REM )
49       Coronel Alzogaray (REM )
9        Desaguadero (REM )
77       Dique La Huertita (SLA )
78       Dique Las Palmeras (SLA )
76       Dique Luján (SLA )
85       Donovan (REM )
10       El Amago (REM )
65       El Arenal (SLA )
11       El Durazno (REM )
47       El Trapiche (REM )
133          Est Nueva Version (TEST )
55       Estancia Grande (REM )
68       Estancia Samay-Huasi (SLA )
12       Fraga (REM )
67       Frías (SLA )
13       Justo Daract (REM )
14       La Angelina (REM )
50       La Botija (REM )
72       La Brea (TEST )
15       La Calera (REM )
84       La Candelaria (PRONO )
71       La Cañada (SLA )
83       La Carolina (PRONO )
16       La Cumbre (REM )
17       La Esquina (REM )
66       La Estancia (SLA )
87       La Florida - Dique (REM )
18       La Florida (REM1 )
70       La Porota (SLA )
19       La Punilla (REM )
20       La Punta (REM )
21       La Toma (REM )
22       La Tranca (REM )
23       Lafinur (REM )
61       Laguna Larga (SLA )
56       Las Chacras (REM )
62       Las Chacras(San Martín) (SLA )
54       Los Coros (REM )
73       Los Ruartes (TEST )
81       Luján (PRONO )
25       Luján (REM )
26       Martín de Loyola (REM )
27       Merlo (REM )
48       Merlo Alto (REM )
69       Mesilla del Cura (SLA )
28       Naschel (REM )
44       Navia (REM )
29       Nogolí (REM )
30       Nueva Galia (REM )
31       Paso Grande (REM )
51       Potrero de los Funes (REM )
75       Pozo La Porota (SLA )
53       Quebrada de las Higueritas (REM )
82       Quines (PRONO )
79       Río San Francisco (SLA )
32       San Francisco (REM )
46       San Luis Rural (REM )
34       San Martín (REM )
64       San Martín (TEST )
36       San Miguel (REM )
35       Santa Rosa (REM )
52       Soven (REM )
63       Tala Verde (SLA )
37       Tilisarao (REM )
38       Unión (REM )
45       Valle de Pancanta (REM )
86       Varela (REM )
39       Villa de Praga (REM )
40       Villa Gral. Roca (REM )
41       Villa Larca (REM )
80       Villa Mercedes - INTA (TEST )
42       Villa Mercedes (REM )
60       Villa Reynolds (REM )
43       Zanjitas (REM )

Answered By: MendelG

Extract site id and value from dropdown on webpage

Question:

Answers: