Extract elements from bs4.element.ResultSet

Question:

I’m looking to extract the two numeric value from this bs4.

forecast = [<div class="cell "><span>1.2</span><span class="m-unit"></span> - <span>2.0</span><span class="m-unit"></span></div>,
 <div class="cell "><span>1.5</span><span class="m-unit"></span> - <span>2.6</span><span class="m-unit"></span></div>,

Do you know how to integrate them directly into a dataframe?

forecast[1].contents[3]

But is not robust to extract all the numerical values from the forecast bs4 elements

Asked By: Pbcacao

||

Answers:

If the pattern is always identical and no other deviations occur, the following procedure can be followed:

pd.DataFrame([e.text.split('-') for e in forcast])

Note: For reliable results, more detailed information is needed in the questionnaire.

Example

from bs4 import BeautifulSoup
import pandas as pd

html = '''<div class="cell "><span>1.2</span><span class="m-unit"></span> - <span>2.0</span><span class="m-unit"></span></div>
<div class="cell "><span>1.5</span><span class="m-unit"></span> - <span>2.6</span><span class="m-unit"></span></div>'''

soup = BeautifulSoup(html)

forcast = soup.select('div')

pd.DataFrame([e.text.split('-') for e in forcast])

Output

0 1
0 1.2 2
1 1.5 2.6
Answered By: HedgeHog
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.