Using XPath in ElementTree
Question:
My XML file looks like the following:
<?xml version="1.0"?>
<ItemSearchResponse >
<Items>
<Item>
<ItemAttributes>
<ListPrice>
<Amount>2260</Amount>
</ListPrice>
</ItemAttributes>
<Offers>
<Offer>
<OfferListing>
<Price>
<Amount>1853</Amount>
</Price>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
</ItemSearchResponse>
All I want to do is extract the ListPrice.
This is the code I am using:
>> from elementtree import ElementTree as ET
>> fp = open("output.xml","r")
>> element = ET.parse(fp).getroot()
>> e = element.findall('ItemSearchResponse/Items/Item/ItemAttributes/ListPrice/Amount')
>> for i in e:
>> print i.text
>>
>> e
>>
Absolutely no output. I also tried
>> e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
No difference.
What am I doing wrong?
Answers:
There are 2 problems that you have.
1) element
contains only the root element, not recursively the whole document. It is of type Element not ElementTree.
2) Your search string needs to use namespaces if you keep the namespace in the XML.
To fix problem #1:
You need to change:
element = ET.parse(fp).getroot()
to:
element = ET.parse(fp)
To fix problem #2:
You can take off the ?>
<ItemSearchResponse>
<Items>
<Item>
<ItemAttributes>
<ListPrice>
<Amount>2260</Amount>
</ListPrice>
</ItemAttributes>
<Offers>
<Offer>
<OfferListing>
<Price>
<Amount>1853</Amount>
</Price>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
</ItemSearchResponse>
With this document you can use the following search string:
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
The full code:
from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
for i in e:
print i.text
Alternate fix to problem #2:
Otherwise you need to specify the utput.xml”,”r”)
element = ET.parse(fp)
namespace = “{http://webservices.amazon.com/AWSECommerceService/2008-08-19}”
e = element.findall(‘{0}Items/{0}Item/{0}ItemAttributes/{0}ListPrice/{0}Amount’.format(namespace))
for i in e:
print i.text
Both print:
2260
Element tree uses namespaces so all the elements in your xml have name like
{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items
So make the search include the namespace
e.g.
search = '{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Item/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ItemAttributes/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ListPrice/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount'
element.findall( search )
gives the element corresponding to 2260
from xml.etree import ElementTree as ET
tree = ET.parse("output.xml")
namespace = tree.getroot().tag[1:].split("}")[0]
amount = tree.find(".//{%s}Amount" % namespace).text
Also, consider using lxml. It’s way faster.
from lxml import ElementTree as ET
I ended up stripping out the ]+”‘, ”, xml_string)
Obviously be very careful with this, but it worked well for me.
One of the most straight forward approach and works even with python 3.0 and other versions is like below:
It just takes the root and starts getting into it till we get the
specified “Amount” tag
from xml.etree import ElementTree as ET
tree = ET.parse('output.xml')
root = tree.getroot()
#print(root)
e = root.find(".//{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount")
print(e.text)
My XML file looks like the following:
<?xml version="1.0"?>
<ItemSearchResponse >
<Items>
<Item>
<ItemAttributes>
<ListPrice>
<Amount>2260</Amount>
</ListPrice>
</ItemAttributes>
<Offers>
<Offer>
<OfferListing>
<Price>
<Amount>1853</Amount>
</Price>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
</ItemSearchResponse>
All I want to do is extract the ListPrice.
This is the code I am using:
>> from elementtree import ElementTree as ET
>> fp = open("output.xml","r")
>> element = ET.parse(fp).getroot()
>> e = element.findall('ItemSearchResponse/Items/Item/ItemAttributes/ListPrice/Amount')
>> for i in e:
>> print i.text
>>
>> e
>>
Absolutely no output. I also tried
>> e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
No difference.
What am I doing wrong?
There are 2 problems that you have.
1) element
contains only the root element, not recursively the whole document. It is of type Element not ElementTree.
2) Your search string needs to use namespaces if you keep the namespace in the XML.
To fix problem #1:
You need to change:
element = ET.parse(fp).getroot()
to:
element = ET.parse(fp)
To fix problem #2:
You can take off the ?>
<ItemSearchResponse>
<Items>
<Item>
<ItemAttributes>
<ListPrice>
<Amount>2260</Amount>
</ListPrice>
</ItemAttributes>
<Offers>
<Offer>
<OfferListing>
<Price>
<Amount>1853</Amount>
</Price>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
</ItemSearchResponse>
With this document you can use the following search string:
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
The full code:
from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
for i in e:
print i.text
Alternate fix to problem #2:
Otherwise you need to specify the utput.xml”,”r”)
element = ET.parse(fp)
namespace = “{http://webservices.amazon.com/AWSECommerceService/2008-08-19}”
e = element.findall(‘{0}Items/{0}Item/{0}ItemAttributes/{0}ListPrice/{0}Amount’.format(namespace))
for i in e:
print i.text
Both print:
2260
Element tree uses namespaces so all the elements in your xml have name like
{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items
So make the search include the namespace
e.g.
search = '{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Item/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ItemAttributes/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ListPrice/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount'
element.findall( search )
gives the element corresponding to 2260
from xml.etree import ElementTree as ET
tree = ET.parse("output.xml")
namespace = tree.getroot().tag[1:].split("}")[0]
amount = tree.find(".//{%s}Amount" % namespace).text
Also, consider using lxml. It’s way faster.
from lxml import ElementTree as ET
I ended up stripping out the ]+”‘, ”, xml_string)
Obviously be very careful with this, but it worked well for me.
One of the most straight forward approach and works even with python 3.0 and other versions is like below:
It just takes the root and starts getting into it till we get the
specified “Amount” tag
from xml.etree import ElementTree as ET
tree = ET.parse('output.xml')
root = tree.getroot()
#print(root)
e = root.find(".//{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount")
print(e.text)