Beautifulsoup select an element based on second last child

Question:

I’m trying to select second last child from the breadcrumbs section.

<div class="breadcrumbs">
    <span><a href="/">Home</a></span>
    <i class="arrow"></i>
    <span><a href="/list1/">List Name 1</a></span>
    <i class="arrow"></i>
    <span><a href="/list2/">List Name 2</a></span>
    <i class="arrow"></i>
    <span>List Name 3</span>
</div>

I write code in BS4 python to print second last child data to show (List Name 2)

r = requests.get(link)
soup = BeautifulSoup(r.content, 'lxml')  
    
listname = soup.select_one('.breadcrumbs span:nth-last-child(2) a').text
    
print(listname)

But it gives error:

AttributeError: ‘NoneType’ object has no attribute ‘text’

Sometime page has 2 breadcrumbs and sometime has 3. That is why I only need second last name.

Asked By: nasir

||

Answers:

You can try to select all <a> inside breadcrumbs and use [-1] index:

from bs4 import BeautifulSoup


html_code = """
<div class="breadcrumbs">
    <span><a href="/">Home</a></span>
    <i class="arrow"></i>
    <span><a href="/list1/">List Name 1</a></span>
    <i class="arrow"></i>
    <span><a href="/list2/">List Name 2</a></span>
    <i class="arrow"></i>
    <span>List Name 3</span>
</div>"""

soup = BeautifulSoup(html_code, "html.parser")

print(soup.select(".breadcrumbs a")[-1].text)

Prints:

List Name 2
Answered By: Andrej Kesely

It should be noted that the reason why :nth-last-child(2) did not work is that the span element you wanted was not the second to last, but the third to last child. In order to get the second to last span you have to restrict the :nth-last-child check to only spans:

from bs4 import BeautifulSoup

TEXT = """
<div class="breadcrumbs">
    <span><a href="/">Home</a></span>
    <i class="arrow"></i>
    <span><a href="/list1/">List Name 1</a></span>
    <i class="arrow"></i>
    <span><a href="/list2/">List Name 2</a></span>
    <i class="arrow"></i>
    <span>List Name 3</span>
</div>
"""

soup = BeautifulSoup(TEXT, "html.parser")
listname = soup.select_one('.breadcrumbs > :nth-last-child(2 of span) a').text
print(listname)

Prints

List Name 2
Answered By: facelessuser