BeautifulSoup parent tag
Question:
I have some html that I want to extract text from. Here’s an example of the html:
<p>TEXT I WANT <i> – </i></p>
Now, there are, obviously, lots of <p>
tags in this document. So, find('p')
is not a good way to get at the text I want to extract. However, that <i>
tag is the only one in the document. So, I thought I could just find the <i>
and then go to the parent.
I’ve tried:
up = soup.select('p i').parent
and
up = soup.select('i')
print(up.parent)
and I’ve tried it with .parents
, I’ve tried find_all('i')
, find('i')
… But I always get:
'list' object has no attribute "parent"
What am I doing wrong?
Answers:
find_all()
returns a list. find('i')
returns the first matching element, or None
.
The same applies to select()
(returns a list) and select_one()
(first match or None
).
Thus, use:
try:
up = soup.find('i').parent
except AttributeError:
# no <i> element
Demo:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<p>TEXT I WANT <i> – </i></p>')
>>> soup.find('i').parent
<p>TEXT I WANT <i> – </i></p>
>>> soup.find('i').parent.text
u'TEXT I WANT u2013 '
Both select()
and find_all()
return you an array of elements. You should do like follow:
for el in soup.select('i'):
print el.parent.text
This works:
i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()
output:
'TEXT I WANT'
As mentioned in other answers, find_all()
returns a list, whereas find()
returns the first match or None
If you are unsure about the presence of an i tag you could simply use a try/except
block
soup.select()
returns a Python List. So you have ‘unlist’ the variable
e.g.:
>>> [up] = soup.select('i')
>>> print(up.parent)
or
>>> up = soup.select('i')
>>> print(up[0].parent)
I think you are actually looking in a group of these kind of tags.The select function actually returns list of mentioned tags so if you are asking for the parent tag,it doesn’t know which member of the list do you mean.Try
up = soup.select('p i')[0].parent
print(up)
this will tell that you are actually looking for the parentof first one in the list (‘[0]’).I don’t know this will work just try it out.
I have some html that I want to extract text from. Here’s an example of the html:
<p>TEXT I WANT <i> – </i></p>
Now, there are, obviously, lots of <p>
tags in this document. So, find('p')
is not a good way to get at the text I want to extract. However, that <i>
tag is the only one in the document. So, I thought I could just find the <i>
and then go to the parent.
I’ve tried:
up = soup.select('p i').parent
and
up = soup.select('i')
print(up.parent)
and I’ve tried it with .parents
, I’ve tried find_all('i')
, find('i')
… But I always get:
'list' object has no attribute "parent"
What am I doing wrong?
find_all()
returns a list. find('i')
returns the first matching element, or None
.
The same applies to select()
(returns a list) and select_one()
(first match or None
).
Thus, use:
try:
up = soup.find('i').parent
except AttributeError:
# no <i> element
Demo:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<p>TEXT I WANT <i> – </i></p>')
>>> soup.find('i').parent
<p>TEXT I WANT <i> – </i></p>
>>> soup.find('i').parent.text
u'TEXT I WANT u2013 '
Both select()
and find_all()
return you an array of elements. You should do like follow:
for el in soup.select('i'):
print el.parent.text
This works:
i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()
output:
'TEXT I WANT'
As mentioned in other answers, find_all()
returns a list, whereas find()
returns the first match or None
If you are unsure about the presence of an i tag you could simply use a try/except
block
soup.select()
returns a Python List. So you have ‘unlist’ the variable
e.g.:
>>> [up] = soup.select('i')
>>> print(up.parent)
or
>>> up = soup.select('i')
>>> print(up[0].parent)
I think you are actually looking in a group of these kind of tags.The select function actually returns list of mentioned tags so if you are asking for the parent tag,it doesn’t know which member of the list do you mean.Try
up = soup.select('p i')[0].parent
print(up)
this will tell that you are actually looking for the parentof first one in the list (‘[0]’).I don’t know this will work just try it out.