How can I convert XML into a Python object?
Question:
I need to load an XML file and convert the contents into an object-oriented Python structure. I want to take this:
<main>
<object1 attr="name">content</object>
</main>
And turn it into something like this:
main
main.object1 = "content"
main.object1.attr = "name"
The XML data will have a more complicated structure than that and I can’t hard code the element names. The attribute names need to be collected when parsing and used as the object properties.
How can I convert XML data into a Python object?
Answers:
How about this
If googling around for a code-generator doesn’t work, you could write your own that uses XML as input and outputs objects in your language of choice.
It’s not terribly difficult, however the three step process of Parse XML, Generate Code, Compile/Execute Script does making debugging a bit harder.
There are three common XML parsers for python: xml.dom.minidom, elementree, and BeautifulSoup.
IMO, BeautifulSoup is by far the best.
I’ve been recommending this more than once today, but try Beautiful Soup (easy_install BeautifulSoup).
from BeautifulSoup import BeautifulSoup
xml = """
<main>
<object attr="name">content</object>
</main>
"""
soup = BeautifulSoup(xml)
# look in the main node for object's with attr=name, optionally look up attrs with regex
my_objects = soup.main.findAll("object", attrs={'attr':'name'})
for my_object in my_objects:
# this will print a list of the contents of the tag
print my_object.contents
# if only text is inside the tag you can use this
# print tag.string
#@Stephen:
#"can't hardcode the element names, so I need to collect them
#at parse and use them somehow as the object names."
#I don't think thats possible. Instead you can do this.
#this will help you getting any object with a required name.
import BeautifulSoup
class Coll(object):
"""A class which can hold your Foo clas objects
and retrieve them easily when you want
abstracting the storage and retrieval logic
"""
def __init__(self):
self.foos={}
def add(self, fooobj):
self.foos[fooobj.name]=fooobj
def get(self, name):
return self.foos[name]
class Foo(object):
"""The required class
"""
def __init__(self, name, attr1=None, attr2=None):
self.name=name
self.attr1=attr1
self.attr2=attr2
s="""<main>
<object name="somename">
<attr name="attr1">value1</attr>
<attr name="attr2">value2</attr>
</object>
<object name="someothername">
<attr name="attr1">value3</attr>
<attr name="attr2">value4</attr>
</object>
</main>
"""
#
soup=BeautifulSoup.BeautifulSoup(s)
bars=Coll()
for each in soup.findAll('object'):
bar=Foo(each['name'])
attrs=each.findAll('attr')
for attr in attrs:
setattr(bar, attr['name'], attr.renderContents())
bars.add(bar)
#retrieve objects by name
print bars.get('somename').__dict__
print 'nn', bars.get('someothername').__dict__
output
{'attr2': 'value2', 'name': u'somename', 'attr1': 'value1'}
{'attr2': 'value4', 'name': u'someothername', 'attr1': 'value3'}
David Mertz’s gnosis.xml.objectify would seem to do this for you. Documentation’s a bit hard to come by, but there are a few IBM articles on it, including this one (text only version).
from gnosis.xml import objectify
xml = "<root><nodes><node>node 1</node><node>node 2</node></nodes></root>"
root = objectify.make_instance(xml)
print root.nodes.node[0].PCDATA # node 1
print root.nodes.node[1].PCDATA # node 2
Creating xml from objects in this way is a different matter, though.
It’s worth looking at lxml.objectify
.
xml = """<main>
<object1 attr="name">content</object1>
<object1 attr="foo">contenbar</object1>
<test>me</test>
</main>"""
from lxml import objectify
main = objectify.fromstring(xml)
main.object1[0] # content
main.object1[1] # contenbar
main.object1[0].get("attr") # name
main.test # me
Or the other way around to build xml structures:
item = objectify.Element("item")
item.title = "Best of python"
item.price = 17.98
item.price.set("currency", "EUR")
order = objectify.Element("order")
order.append(item)
order.item.quantity = 3
order.price = sum(item.price * item.quantity for item in order.item)
import lxml.etree
print(lxml.etree.tostring(order, pretty_print=True))
Output:
<order>
<item>
<title>Best of python</title>
<price currency="EUR">17.98</price>
<quantity>3</quantity>
</item>
<price>53.94</price>
</order>
I need to load an XML file and convert the contents into an object-oriented Python structure. I want to take this:
<main>
<object1 attr="name">content</object>
</main>
And turn it into something like this:
main
main.object1 = "content"
main.object1.attr = "name"
The XML data will have a more complicated structure than that and I can’t hard code the element names. The attribute names need to be collected when parsing and used as the object properties.
How can I convert XML data into a Python object?
How about this
If googling around for a code-generator doesn’t work, you could write your own that uses XML as input and outputs objects in your language of choice.
It’s not terribly difficult, however the three step process of Parse XML, Generate Code, Compile/Execute Script does making debugging a bit harder.
There are three common XML parsers for python: xml.dom.minidom, elementree, and BeautifulSoup.
IMO, BeautifulSoup is by far the best.
I’ve been recommending this more than once today, but try Beautiful Soup (easy_install BeautifulSoup).
from BeautifulSoup import BeautifulSoup
xml = """
<main>
<object attr="name">content</object>
</main>
"""
soup = BeautifulSoup(xml)
# look in the main node for object's with attr=name, optionally look up attrs with regex
my_objects = soup.main.findAll("object", attrs={'attr':'name'})
for my_object in my_objects:
# this will print a list of the contents of the tag
print my_object.contents
# if only text is inside the tag you can use this
# print tag.string
#@Stephen:
#"can't hardcode the element names, so I need to collect them
#at parse and use them somehow as the object names."
#I don't think thats possible. Instead you can do this.
#this will help you getting any object with a required name.
import BeautifulSoup
class Coll(object):
"""A class which can hold your Foo clas objects
and retrieve them easily when you want
abstracting the storage and retrieval logic
"""
def __init__(self):
self.foos={}
def add(self, fooobj):
self.foos[fooobj.name]=fooobj
def get(self, name):
return self.foos[name]
class Foo(object):
"""The required class
"""
def __init__(self, name, attr1=None, attr2=None):
self.name=name
self.attr1=attr1
self.attr2=attr2
s="""<main>
<object name="somename">
<attr name="attr1">value1</attr>
<attr name="attr2">value2</attr>
</object>
<object name="someothername">
<attr name="attr1">value3</attr>
<attr name="attr2">value4</attr>
</object>
</main>
"""
#
soup=BeautifulSoup.BeautifulSoup(s)
bars=Coll()
for each in soup.findAll('object'):
bar=Foo(each['name'])
attrs=each.findAll('attr')
for attr in attrs:
setattr(bar, attr['name'], attr.renderContents())
bars.add(bar)
#retrieve objects by name
print bars.get('somename').__dict__
print 'nn', bars.get('someothername').__dict__
output
{'attr2': 'value2', 'name': u'somename', 'attr1': 'value1'}
{'attr2': 'value4', 'name': u'someothername', 'attr1': 'value3'}
David Mertz’s gnosis.xml.objectify would seem to do this for you. Documentation’s a bit hard to come by, but there are a few IBM articles on it, including this one (text only version).
from gnosis.xml import objectify
xml = "<root><nodes><node>node 1</node><node>node 2</node></nodes></root>"
root = objectify.make_instance(xml)
print root.nodes.node[0].PCDATA # node 1
print root.nodes.node[1].PCDATA # node 2
Creating xml from objects in this way is a different matter, though.
It’s worth looking at lxml.objectify
.
xml = """<main>
<object1 attr="name">content</object1>
<object1 attr="foo">contenbar</object1>
<test>me</test>
</main>"""
from lxml import objectify
main = objectify.fromstring(xml)
main.object1[0] # content
main.object1[1] # contenbar
main.object1[0].get("attr") # name
main.test # me
Or the other way around to build xml structures:
item = objectify.Element("item")
item.title = "Best of python"
item.price = 17.98
item.price.set("currency", "EUR")
order = objectify.Element("order")
order.append(item)
order.item.quantity = 3
order.price = sum(item.price * item.quantity for item in order.item)
import lxml.etree
print(lxml.etree.tostring(order, pretty_print=True))
Output:
<order>
<item>
<title>Best of python</title>
<price currency="EUR">17.98</price>
<quantity>3</quantity>
</item>
<price>53.94</price>
</order>