Really simple way to deal with XML in Python?

Question:

Musing over a recently asked question, I started to wonder if there is a really simple way to deal with XML documents in Python. A pythonic way, if you will.

Perhaps I can explain best if i give example: let’s say the following – which i think is a good example of how XML is (mis)used in web services – is the response i get from http request to http://www.google.com/ig/api?weather=94043

<xml_api_reply version="1">
  <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
    <forecast_information>
      <city data="Mountain View, CA"/>
      <postal_code data="94043"/>
      <latitude_e6 data=""/>
      <longitude_e6 data=""/>
      <forecast_date data="2010-06-23"/>
      <current_date_time data="2010-06-24 00:02:54 +0000"/>
      <unit_system data="US"/>
    </forecast_information>
    <current_conditions>
      <condition data="Sunny"/>
      <temp_f data="68"/>
      <temp_c data="20"/>
      <humidity data="Humidity: 61%"/>
      <icon data="/ig/images/weather/sunny.gif"/>
      <wind_condition data="Wind: NW at 19 mph"/>
    </current_conditions>
    ...
    <forecast_conditions>
      <day_of_week data="Sat"/>
      <low data="59"/>
      <high data="75"/>
      <icon data="/ig/images/weather/partly_cloudy.gif"/>
      <condition data="Partly Cloudy"/>
    </forecast_conditions>
  </weather>
</xml_api_reply>

After loading/parsing such document, i would like to be able to access the information as simple as say

>>> xml['xml_api_reply']['weather']['forecast_information']['city'].data
'Mountain View, CA'

or

>>> xml.xml_api_reply.weather.current_conditions.temp_f['data']
'68'

From what I saw so far, seems that ElementTree is the closest to what I dream of. But it’s not there, there is still some fumbling to do when consuming XML. OTOH, what I am thinking is not that complicated – probably just thin veneer on top of a parser – and yet it can decrease annoyance of dealing with XML. Is there such a magic? (And if not – why?)

PS. Note I have tried BeautifulSoup already and while I like its approach, it has real issues with empty <element/>s – see below in comments for examples.

Asked By: Nas Banov

||

Answers:

If you don’t mind using a 3rd party library, then BeautifulSoup will do almost exactly what you ask for:

>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup('''<snip>''')
>>> soup.xml_api_reply.weather.current_conditions.temp_f['data']
u'68'
Answered By: Mike Boers

If you haven’t already, I’d suggest looking into the DOM API for Python. DOM is a pretty widely used XML interpretation system, so it should be pretty robust.

It’s probably a little more complicated than what you describe, but that comes from its attempts to preserve all the information implicit in XML markup rather than from bad design.

Answered By: tlayton

Take a look at Amara 2, particularly the Bindery part of this tutorial.

It works in a way pretty similar to what you describe.

On the other hand. ElementTree’s find*() methods can give you 90% of that and are packaged with Python.

Answered By: Walter Mundt

I believe that the built in python xml module will do the trick. Look at “xml.parsers.expat”

xml.parsers.expat

Answered By: iform

I highly recommend lxml.etree and xpath to parse and analyse your data. Here is a complete example. I have truncated the xml to make it easier to read.

import lxml.etree

s = """<?xml version="1.0" encoding="utf-8"?>
<xml_api_reply version="1">
  <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
    <forecast_information>
      <city data="Mountain View, CA"/> <forecast_date data="2010-06-23"/>
    </forecast_information>
    <forecast_conditions>
      <day_of_week data="Sat"/>
      <low data="59"/>
      <high data="75"/>
      <icon data="/ig/images/weather/partly_cloudy.gif"/>
      <condition data="Partly Cloudy"/>
    </forecast_conditions>
  </weather>
</xml_api_reply>"""

tree = lxml.etree.fromstring(s)
for weather in tree.xpath('/xml_api_reply/weather'):
    print weather.find('forecast_information/city/@data')[0]
    print weather.find('forecast_information/forecast_date/@data')[0]
    print weather.find('forecast_conditions/low/@data')[0]
    print weather.find('forecast_conditions/high/@data')[0]
Answered By: Jerub

You want a thin veneer? That’s easy to cook up. Try the following trivial wrapper around ElementTree as a start:

# geetree.py
import xml.etree.ElementTree as ET

class GeeElem(object):
    """Wrapper around an ElementTree element. a['foo'] gets the
       attribute foo, a.foo gets the first subelement foo."""
    def __init__(self, elem):
        self.etElem = elem

    def __getitem__(self, name):
        res = self._getattr(name)
        if res is None:
            raise AttributeError, "No attribute named '%s'" % name
        return res

    def __getattr__(self, name):
        res = self._getelem(name)
        if res is None:
            raise IndexError, "No element named '%s'" % name
        return res

    def _getelem(self, name):
        res = self.etElem.find(name)
        if res is None:
            return None
        return GeeElem(res)

    def _getattr(self, name):
        return self.etElem.get(name)

class GeeTree(object):
    "Wrapper around an ElementTree."
    def __init__(self, fname):
        self.doc = ET.parse(fname)

    def __getattr__(self, name):
        if self.doc.getroot().tag != name:
            raise IndexError, "No element named '%s'" % name
        return GeeElem(self.doc.getroot())

    def getroot(self):
        return self.doc.getroot()

You invoke it so:

>>> import geetree
>>> t = geetree.GeeTree('foo.xml')
>>> t.xml_api_reply.weather.forecast_information.city['data']
'Mountain View, CA'
>>> t.xml_api_reply.weather.current_conditions.temp_f['data']
'68'
Answered By: Owen S.

lxml has been mentioned. You might also check out lxml.objectify for some really simple manipulation.

>>> from lxml import objectify
>>> tree = objectify.fromstring(your_xml)
>>> tree.weather.attrib["module_id"]
'0'
>>> tree.weather.forecast_information.city.attrib["data"]
'Mountain View, CA'
>>> tree.weather.forecast_information.postal_code.attrib["data"]
'94043'
Answered By: Ryan Ginstrom

The suds project provides a Web Services client library that works almost exactly as you describe — provide it a wsdl and then use factory methods to create the defined types (and process the responses too!).

Answered By: David Harks

I found the following python-simplexml module, which in the attempts of the author to get something close to SimpleXML from PHP is indeed a small wrapper around ElementTree. It’s under 100 lines but seems to do what was requested:

>>> import SimpleXml
>>> x = SimpleXml.parse(urllib.urlopen('http://www.google.com/ig/api?weather=94043'))
>>> print x.weather.current_conditions.temp_f['data']
58
Answered By: Nas Banov
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.