How to best iterate (breadth-first) over an lxml etree using Python
Question:
I’m trying to wrap my head around lxml (new to this) and how I can use it to do what I want to do. I’ve got an well-formed and valid XML file
<root>
<a>
<b>Text</b>
<c>More text</c>
</a>
<!-- some comment -->
<a>
<d id="10" />
</a>
</root>
something like this. Now I’d like to visit the children breadth-first, and the best I can come up with is something like this:
for e in xml.getroot()[0].itersiblings() :
print(e.tag, e.attrib)
and then take it from there. However, this gives me all elements including comments
a {}
<built-in function Comment> {}
a {}
How do I skip over comments? Is there a better way to iterate over the direct children of a node?
In general, what are the recommendations to parse an XML tree vs. event-driven pull-parsing using, say, iterparse()
?
Answers:
This works for your case
for child in doc.getroot().iterchildren("*"):
print(child.tag, child.attrib)
This question was asked over 9 years ago, but I just ran into this issue myself, and I solved it with the following
import xml.etree.ElementTree as ET
xmlfile = ET.parse("file.xml")
root = xmlfile.getroot()
visit = [root]
while len(visit):
curr = visit.pop(0)
print(curr.tag, curr.attrib, curr.text)
visit += list(curr)
list(node)
will give a list of all the immediate children of that node. So by adding those children to a stack and just repeating that process with whatever is on the top of the stack (popping it off at the same time), we should end up with a standard breadth-first search.
I’m trying to wrap my head around lxml (new to this) and how I can use it to do what I want to do. I’ve got an well-formed and valid XML file
<root>
<a>
<b>Text</b>
<c>More text</c>
</a>
<!-- some comment -->
<a>
<d id="10" />
</a>
</root>
something like this. Now I’d like to visit the children breadth-first, and the best I can come up with is something like this:
for e in xml.getroot()[0].itersiblings() :
print(e.tag, e.attrib)
and then take it from there. However, this gives me all elements including comments
a {}
<built-in function Comment> {}
a {}
How do I skip over comments? Is there a better way to iterate over the direct children of a node?
In general, what are the recommendations to parse an XML tree vs. event-driven pull-parsing using, say, iterparse()
?
This works for your case
for child in doc.getroot().iterchildren("*"):
print(child.tag, child.attrib)
This question was asked over 9 years ago, but I just ran into this issue myself, and I solved it with the following
import xml.etree.ElementTree as ET
xmlfile = ET.parse("file.xml")
root = xmlfile.getroot()
visit = [root]
while len(visit):
curr = visit.pop(0)
print(curr.tag, curr.attrib, curr.text)
visit += list(curr)
list(node)
will give a list of all the immediate children of that node. So by adding those children to a stack and just repeating that process with whatever is on the top of the stack (popping it off at the same time), we should end up with a standard breadth-first search.