parsing xml with namespace from request with lxml in python

Question:

I am trying to get some text out of a table from an online xml file. I can find the tables:

from lxml import etree
import requests

main_file = requests.get('https://training.gov.au/TrainingComponentFiles/CUA/CUAWRT601_R1.xml')
main_file.encoding = 'utf-8-sig'
root = etree.fromstring(main_file.content)
tables = root.xpath('//foo:table', namespaces={"foo": "http://www.authorit.com/xml/authorit"})

print(tables)

But I can’t get any further than that. The text that I am looking for is:

  1. Prepare to write scripts
  2. Write draft scripts
  3. Produce final scripts

When I paste the xml in here: http://xpather.com/

I can get it using the following expression:
//table[1]/tr/td[@width="2700"]/p[@id="4"][not(*)]/text()

but that doesn’t work here and I’m out of ideas. How can I get that text?

Asked By: lucas

||

Answers:

Use the namespace prefix you declared (with namespaces={"foo": "http://www.authorit.com/xml/authorit"}) e.g. instead of //table[1]/tr/td[@width="2700"]/p[@id="4"][not(*)]/text() use //foo:table[1]/foo:tr/foo:td[@width="2700"]/foo:p[@id="4"][not(*)]/text().

Answered By: Martin Honnen
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.