Parsing XML with Python Error: argument of type 'NoneType' is not iterable

Question:

The logic here is: If the page-element does not contain "| kasteeltype" then remove page-element, otherwise keep the page-element.

#Import ElementTree
import defusedxml.ElementTree as ET

#Set Tree & Root
tree = ET.parse("nlwiki-20221020-pages-meta-current1.xml-p1p134538")
root = tree.getroot()

#Namespaces
NSPage = "{http://www.mediawiki.org/xml/export-0.10/}page"
NSRevision = "{http://www.mediawiki.org/xml/export-0.10/}revision"
NSText = "{http://www.mediawiki.org/xml/export-0.10/}text"

#Modify XML
for page in root.findall(NSPage):
  for revision in page.findall(NSRevision):
    text = revision.find(NSText)
    kasteeltype = "| kasteeltype"
    if kasteeltype not in text.text:
      root.remove(page)

#Output
tree.write("output.xml")

This code results in the following error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [8], line 18
     16 text = revision.find(NSText)
     17 kasteeltype = "| kasteeltype"
---> 18 if kasteeltype not in text.text:
     19   root.remove(page)
     20 else:

TypeError: argument of type 'NoneType' is not iterable

I’m a bit clueless now about how to proceed.

The XML-file can be found here: https://dumps.wikimedia.org/nlwiki/20221020/nlwiki-20221020-pages-meta-current1.xml-p1p134538.bz2

It is quite a large file since it is a wikipedia dump.

The expected result should be that all page elements that do not contain the string "| kasteeltype" in the text element under the parent revision should be removed.

Asked By: Nielsi_

||

Answers:

Verify that the element was found

 pages = root.findall(NSPage)
 if pages is not None:
    for page in pages: 
      revisions = page.findall(NSRevision)
      if revisions is not None:
        for revision in revisions:
          text = revision.find(NSText)
          if text is not None and text.text is not None:
             kasteeltype = "| kasteeltype"
             if kasteeltype not in text.text:
               root.remove(page)
  
Answered By: balderman
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.