Beautifulsoup select an element based on second last child
Question:
I’m trying to select second last child from the breadcrumbs section.
<div class="breadcrumbs">
<span><a href="/">Home</a></span>
<i class="arrow"></i>
<span><a href="/list1/">List Name 1</a></span>
<i class="arrow"></i>
<span><a href="/list2/">List Name 2</a></span>
<i class="arrow"></i>
<span>List Name 3</span>
</div>
I write code in BS4 python to print second last child data to show (List Name 2)
r = requests.get(link)
soup = BeautifulSoup(r.content, 'lxml')
listname = soup.select_one('.breadcrumbs span:nth-last-child(2) a').text
print(listname)
But it gives error:
AttributeError: ‘NoneType’ object has no attribute ‘text’
Sometime page has 2 breadcrumbs and sometime has 3. That is why I only need second last name.
Answers:
You can try to select all <a>
inside breadcrumbs and use [-1]
index:
from bs4 import BeautifulSoup
html_code = """
<div class="breadcrumbs">
<span><a href="/">Home</a></span>
<i class="arrow"></i>
<span><a href="/list1/">List Name 1</a></span>
<i class="arrow"></i>
<span><a href="/list2/">List Name 2</a></span>
<i class="arrow"></i>
<span>List Name 3</span>
</div>"""
soup = BeautifulSoup(html_code, "html.parser")
print(soup.select(".breadcrumbs a")[-1].text)
Prints:
List Name 2
It should be noted that the reason why :nth-last-child(2)
did not work is that the span
element you wanted was not the second to last, but the third to last child. In order to get the second to last span
you have to restrict the :nth-last-child
check to only spans
:
from bs4 import BeautifulSoup
TEXT = """
<div class="breadcrumbs">
<span><a href="/">Home</a></span>
<i class="arrow"></i>
<span><a href="/list1/">List Name 1</a></span>
<i class="arrow"></i>
<span><a href="/list2/">List Name 2</a></span>
<i class="arrow"></i>
<span>List Name 3</span>
</div>
"""
soup = BeautifulSoup(TEXT, "html.parser")
listname = soup.select_one('.breadcrumbs > :nth-last-child(2 of span) a').text
print(listname)
Prints
List Name 2
I’m trying to select second last child from the breadcrumbs section.
<div class="breadcrumbs">
<span><a href="/">Home</a></span>
<i class="arrow"></i>
<span><a href="/list1/">List Name 1</a></span>
<i class="arrow"></i>
<span><a href="/list2/">List Name 2</a></span>
<i class="arrow"></i>
<span>List Name 3</span>
</div>
I write code in BS4 python to print second last child data to show (List Name 2)
r = requests.get(link)
soup = BeautifulSoup(r.content, 'lxml')
listname = soup.select_one('.breadcrumbs span:nth-last-child(2) a').text
print(listname)
But it gives error:
AttributeError: ‘NoneType’ object has no attribute ‘text’
Sometime page has 2 breadcrumbs and sometime has 3. That is why I only need second last name.
You can try to select all <a>
inside breadcrumbs and use [-1]
index:
from bs4 import BeautifulSoup
html_code = """
<div class="breadcrumbs">
<span><a href="/">Home</a></span>
<i class="arrow"></i>
<span><a href="/list1/">List Name 1</a></span>
<i class="arrow"></i>
<span><a href="/list2/">List Name 2</a></span>
<i class="arrow"></i>
<span>List Name 3</span>
</div>"""
soup = BeautifulSoup(html_code, "html.parser")
print(soup.select(".breadcrumbs a")[-1].text)
Prints:
List Name 2
It should be noted that the reason why :nth-last-child(2)
did not work is that the span
element you wanted was not the second to last, but the third to last child. In order to get the second to last span
you have to restrict the :nth-last-child
check to only spans
:
from bs4 import BeautifulSoup
TEXT = """
<div class="breadcrumbs">
<span><a href="/">Home</a></span>
<i class="arrow"></i>
<span><a href="/list1/">List Name 1</a></span>
<i class="arrow"></i>
<span><a href="/list2/">List Name 2</a></span>
<i class="arrow"></i>
<span>List Name 3</span>
</div>
"""
soup = BeautifulSoup(TEXT, "html.parser")
listname = soup.select_one('.breadcrumbs > :nth-last-child(2 of span) a').text
print(listname)
Prints
List Name 2