How to parse SOAP XML with Python?
Question:
Goal:
Get the values inside <Name>
tags and print them out. Simplified XML below.
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope >
<soap:Body>
<GetStartEndPointResponse >
<GetStartEndPointResult>
<Code>0</Code>
<Message />
<StartPoints>
<Point>
<Id>545</Id>
<Name>Get Me</Name>
<Type>sometype</Type>
<X>333</X>
<Y>222</Y>
</Point>
<Point>
<Id>634</Id>
<Name>Get me too</Name>
<Type>sometype</Type>
<X>555</X>
<Y>777</Y>
</Point>
</StartPoints>
</GetStartEndPointResult>
</GetStartEndPointResponse>
</soap:Body>
</soap:Envelope>
Attempt:
import requests
from xml.etree import ElementTree
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
# XML parsing here
dom = ElementTree.fromstring(response.text)
names = dom.findall('*/Name')
for name in names:
print(name.text)
I have read other people recommending zeep
to parse soap xml but I found it hard to get my head around.
Answers:
The issue here is dealing with the XML namespaces:
import requests
from xml.etree import ElementTree
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
# define namespace mappings to use as shorthand below
namespaces = {
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'a': 'http://www.etis.fskab.se/v1.0/ETISws',
}
dom = ElementTree.fromstring(response.content)
# reference the namespace mappings here by `<name>:`
names = dom.findall(
'./soap:Body'
'/a:GetStartEndPointResponse'
'/a:GetStartEndPointResult'
'/a:StartPoints'
'/a:Point'
'/a:Name',
namespaces,
)
for name in names:
print(name.text)
The namespaces come from the
and
attributes on the Envelope
and GetStartEndPointResponse
nodes respectively.
Keep in mind, a namespace is inherited by all children nodes of a parent even if the namespace isn’t explicitly specified on the child’s tag as <namespace:tag>
.
Note: I had to use response.content
rather than response.body
.
An old question but worth to mention another option for this task.
I like to use xmltodict
(Github) a lightweight converter of XML
to python dictionary.
Take your soap response in a variable named stack
Parse it with xmltodict.parse
In [48]: stack_d = xmltodict.parse(stack)
Check the result:
In [49]: stack_d
Out[49]:
OrderedDict([('soap:Envelope',
OrderedDict([('@ rel="nofollow noreferrer">here.
from bs4 import BeautifulSoup
xml = BeautifulSoup(xml_string, 'xml')
xml.find('soap:Body') # to get the soup:Body tag.
xml.find('X') # for X tag
Just replace all the 'soap:' and other namespace prefixes such as 'a:' with '' (just remove them an make it a non-SOAP xml file)
new_response = response.text.replace('soap:', '').replace('a:', '')
Then you can just proceed normally.
try like this
import requests
from bs4 import BeautifulSoup
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
xml = BeautifulSoup(response.text, 'xml')
xml.find('soap:Body') # to get the soup:Body tag.
xml.find('X') # for X tag
Goal:
Get the values inside <Name>
tags and print them out. Simplified XML below.
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope >
<soap:Body>
<GetStartEndPointResponse >
<GetStartEndPointResult>
<Code>0</Code>
<Message />
<StartPoints>
<Point>
<Id>545</Id>
<Name>Get Me</Name>
<Type>sometype</Type>
<X>333</X>
<Y>222</Y>
</Point>
<Point>
<Id>634</Id>
<Name>Get me too</Name>
<Type>sometype</Type>
<X>555</X>
<Y>777</Y>
</Point>
</StartPoints>
</GetStartEndPointResult>
</GetStartEndPointResponse>
</soap:Body>
</soap:Envelope>
Attempt:
import requests
from xml.etree import ElementTree
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
# XML parsing here
dom = ElementTree.fromstring(response.text)
names = dom.findall('*/Name')
for name in names:
print(name.text)
I have read other people recommending zeep
to parse soap xml but I found it hard to get my head around.
The issue here is dealing with the XML namespaces:
import requests
from xml.etree import ElementTree
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
# define namespace mappings to use as shorthand below
namespaces = {
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'a': 'http://www.etis.fskab.se/v1.0/ETISws',
}
dom = ElementTree.fromstring(response.content)
# reference the namespace mappings here by `<name>:`
names = dom.findall(
'./soap:Body'
'/a:GetStartEndPointResponse'
'/a:GetStartEndPointResult'
'/a:StartPoints'
'/a:Point'
'/a:Name',
namespaces,
)
for name in names:
print(name.text)
The namespaces come from the and
attributes on the
Envelope
and GetStartEndPointResponse
nodes respectively.
Keep in mind, a namespace is inherited by all children nodes of a parent even if the namespace isn’t explicitly specified on the child’s tag as <namespace:tag>
.
Note: I had to use response.content
rather than response.body
.
An old question but worth to mention another option for this task.
I like to use xmltodict
(Github) a lightweight converter of XML
to python dictionary.
Take your soap response in a variable named stack
Parse it with xmltodict.parse
In [48]: stack_d = xmltodict.parse(stack)
Check the result:
In [49]: stack_d
Out[49]:
OrderedDict([('soap:Envelope',
OrderedDict([('@ rel="nofollow noreferrer">here.
from bs4 import BeautifulSoup
xml = BeautifulSoup(xml_string, 'xml')
xml.find('soap:Body') # to get the soup:Body tag.
xml.find('X') # for X tag
Just replace all the 'soap:' and other namespace prefixes such as 'a:' with '' (just remove them an make it a non-SOAP xml file)
new_response = response.text.replace('soap:', '').replace('a:', '')
Then you can just proceed normally.
try like this
import requests
from bs4 import BeautifulSoup
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
xml = BeautifulSoup(response.text, 'xml')
xml.find('soap:Body') # to get the soup:Body tag.
xml.find('X') # for X tag