iterparse

Python iterparse large XML while filtering with elements and children

Python iterparse large XML while filtering with elements and children Question: I am attempting to parse product data from icecat. The data comes in large xml files. (3-7gb). In order to reduce the amount of product data I am bringing in, I need to filter this list before moving to my next step. Particularly I …

Total answers: 2

Why is lxml.etree.iterparse() eating up all my memory?

Why is lxml.etree.iterparse() eating up all my memory? Question: This eventually consumes all my available memory and then the process is killed. I’ve tried changing the tag from schedule to ‘smaller’ tags but that didn’t make a difference. What am I doing wrong / how can I process this large file with iterparse()? import lxml.etree …

Total answers: 3

Iteratively parsing HTML (with lxml?)

Iteratively parsing HTML (with lxml?) Question: I’m currently trying to iteratively parse a very large HTML document (I know.. yuck) using lxml.etree.iterparse: Incremental parser. Parses XML into a tree and generates tuples (event, element) in a SAX-like fashion I am using an incremental/iterative/SAX approach to reduce the amount of memory used (I don’t want to …

Total answers: 5