Beautifulsoup – how to get text from <span>'s

Question

I’m trying to scrape a website. All is going fine, but I want to find the text between <span>. I can retrieve the 1st one, but I cant get to the next ones.
This is the html excerpt:

<ul class="product-small-specs" data-test="product-specs">
    <li>
    <span>Engels</span>
    </li>
    <li>
    <span>Hardcover</span>
    </li>
    <li>
    <span>9780141395838</span>
    </li>
    <li>
    <span>Druk: New ed</span>
    </li>
    <li>
    <span>oktober 2014</span>
    </li>
    <li>
    <span>352 pagina's</span>
    </li>
    </ul>

When I try this:

xxx.span.text

I get 'Engels' (which is ok).

But how do I get the text between the next ‘span’s?

xxx.span.next_sibling

gives '/n'

Any help would be highly appreciated.

edit:
The url is this

rec_all = soup.find_all("ul", class_="product-small-specs")
rec = soup.find("ul", class_="product-small-specs")

for iets in rec_all:
    for a in iets:
        print(a.span.text)
        print(a.span.next_sibling)

Asked By: Aurora Borealis

||

Source

Answer 1

You can use find_all("span") to get list with all <span> and then you can use for-loop to get text from every item on list

from bs4 import BeautifulSoup as BS

text = '''<ul class="product-small-specs" data-test="product-specs">
    <li>
    <span>Engels</span>
    </li>
    <li>
    <span>Hardcover</span>
    </li>
    <li>
    <span>9780141395838</span>
    </li>
    <li>
    <span>Druk: New ed</span>
    </li>
    <li>
    <span>oktober 2014</span>
    </li>
    <li>
    <span>352 pagina's</span>
    </li>
</ul>'''


soup = BS(text, 'html.parser')

all_items = soup.find_all('span')

for item in all_items:
    print(item.text)

Result:

Engels
Hardcover
9780141395838
Druk: New ed
oktober 2014
352 pagina's

EDIT:

If you need all <span> in selected <ul> then you can use

ul = soup.find('ul', class_="product-small-specs")

all_items = ul.find_all('span') # search only inside `ul`

for item in all_items:
    print(item.text)

EDIT:

If you have more ul and more span in li then you can use nested for-loops

soup = BS(text, 'html.parser')

for ul in soup.find_all("ul", class_="product-small-specs"):
    print('--- ul ---')
    for li in ul.find_all('li'):
        print('  --- li ---')
        for span in li.find_all('span'):
            print('    span:', span.text)

Result:

--- ul ---
  --- li ---
    span: Engels
  --- li ---
    span: Hardcover
  --- li ---
    span: 9780141395838
  --- li ---
    span: Druk: New ed
  --- li ---
    span: oktober 2014
  --- li ---
    span: 352 pagina's

Answered By: furas

Beautifulsoup – how to get text from <span>'s

Question:

Answers: