Test if children tag exists in beautifulsoup

Question:

i have an XML file with an defined structure but different number of tags, like

file1.xml:

<document>
  <subDoc>
    <id>1</id>
    <myId>1</myId>
  </subDoc>
</document>

file2.xml:

<document>
  <subDoc>
    <id>2</id>
  </subDoc>
</document>

Now i like to check, if the tag myId exits. So i did the following:

data = open("file1.xml",'r').read()
xml = BeautifulSoup(data)

hasAttrBs = xml.document.subdoc.has_attr('myID')
hasAttrPy = hasattr(xml.document.subdoc,'myID')
hasType = type(xml.document.subdoc.myid)

The result is for
file1.xml:

hasAttrBs -> False
hasAttrPy -> True
hasType ->   <class 'bs4.element.Tag'>

file2.xml:

hasAttrBs -> False
hasAttrPy -> True
hasType -> <type 'NoneType'>

Okay, <myId> is not an attribute of <subdoc>.

But how i can test, if an sub-tag exists?

//Edit: By the way: I’m don’t really like to iterate trough the whole subdoc, because that will be very slow. I hope to find an way where I can direct address/ask that element.

Asked By: The Bndr

||

Answers:

you can handle it like this:

for child in xml.document.subdoc.children:
    if 'myId' == child.name:
       return True
Answered By: chyoo CHENG

The simplest way to find if a child tag exists is simply

childTag = xml.find('childTag')
if childTag:
    # do stuff

More specifically to OP’s question:

If you don’t know the structure of the XML doc, you can use the .find() method of the soup. Something like this:

with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
    xml = BeautifulSoup(data.read())
    xml2 = BeautifulSoup(data2.read())

    hasAttrBs = xml.find("myId")
    hasAttrBs2 = xml2.find("myId")

If you do know the structure, you can get the desired element by accessing the tag name as an attribute like this xml.document.subdoc.myid. So the whole thing would go something like this:

with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
    xml = BeautifulSoup(data.read())
    xml2 = BeautifulSoup(data2.read())

    hasAttrBs = xml.document.subdoc.myid
    hasAttrBs2 = xml2.document.subdoc.myid
    print hasAttrBs
    print hasAttrBs2

Prints

<myid>1</myid>
None
Answered By: wpercy
if tag.find('child_tag_name'):
Answered By: ahuigo

Here’s an example to check if h2 tag exists in an Instagram URL. Hope you find it useful:

import datetime
import urllib
import requests
from bs4 import BeautifulSoup

instagram_url = 'https://www.instagram.com/p/BHijrYFgX2v/?taken-by=findingmero'
html_source = requests.get(instagram_url).text
soup = BeautifulSoup(html_source, "lxml")

if not soup.find('h2'):
    print("didn't find h2")
Answered By: Mona Jalal

You can do it with if tag.myID:

If you want to check if myID is the direct child not child of child use if tag.find("myID", recursive=False):

If you want to check if tag has no child, use if tag.find(True):

Answered By: LF00
page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
page
soup = BeautifulSoup(page.content, 'html.parser')
testNode = list(soup.children)[1]

def hasChild(node):
    print(type(node))
    try:
        node.children
        return True
    except:
        return False

 if( hasChild(testNode) ):
     firstChild=list(testNode.children)[0]
     if( hasChild(firstChild) ):
        print('I found Grand Child ')
Answered By: user2458922

if you are using a CSS selector

content = soup_elm.select('.css_selector')
if len(content) == 0:
    return None
Answered By: X.Creates

You could also try it this way :

response = requests.get("Your URL here")
soup = BeautifulSoup(response.text,'lxml')
RESULT = soup.select_one('CSS_SELECTOR_HERE') # for one element search 
print(RESULT)

Note that the CSS Selector for Bs4 is a little different to other selector methods.
Click Here for documentation on how to use CSS selectors.

soup.select works for an all element selection and works for elements with attributes as well.

Answered By: Infallible wisdoms