How to parse SOAP XML with Python?

Question

Goal:
Get the values inside <Name> tags and print them out. Simplified XML below.

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope   >
   <soap:Body>
      <GetStartEndPointResponse >
         <GetStartEndPointResult>
            <Code>0</Code>
            <Message />
            <StartPoints>
               <Point>
                  <Id>545</Id>
                  <Name>Get Me</Name>
                  <Type>sometype</Type>
                  <X>333</X>
                  <Y>222</Y>
               </Point>
               <Point>
                  <Id>634</Id>
                  <Name>Get me too</Name>
                  <Type>sometype</Type>
                  <X>555</X>
                  <Y>777</Y>
               </Point>
            </StartPoints>
         </GetStartEndPointResult>
      </GetStartEndPointResponse>
   </soap:Body>
</soap:Envelope>

Attempt:

import requests
from xml.etree import ElementTree

response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')

# XML parsing here
dom = ElementTree.fromstring(response.text)
names = dom.findall('*/Name')
for name in names:
    print(name.text)

I have read other people recommending zeep to parse soap xml but I found it hard to get my head around.

Asked By: Clone

||

Source

Answer 1

The issue here is dealing with the XML namespaces:

import requests
from xml.etree import ElementTree

response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')

# define namespace mappings to use as shorthand below
namespaces = {
    'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
    'a': 'http://www.etis.fskab.se/v1.0/ETISws',
}
dom = ElementTree.fromstring(response.content)

# reference the namespace mappings here by `<name>:`
names = dom.findall(
    './soap:Body'
    '/a:GetStartEndPointResponse'
    '/a:GetStartEndPointResult'
    '/a:StartPoints'
    '/a:Point'
    '/a:Name',
    namespaces,
)
for name in names:
    print(name.text)

The namespaces come from the and attributes on the Envelope and GetStartEndPointResponse nodes respectively.

Keep in mind, a namespace is inherited by all children nodes of a parent even if the namespace isn’t explicitly specified on the child’s tag as <namespace:tag>.

Note: I had to use response.content rather than response.body.

Answered By: Daniel Corin

Answer 2

An old question but worth to mention another option for this task.

I like to use xmltodict (Github) a lightweight converter of XML to python dictionary.

Take your soap response in a variable named stack

Parse it with xmltodict.parse

In [48]: stack_d = xmltodict.parse(stack)

Check the result:

In [49]: stack_d
Out[49]:
OrderedDict([('soap:Envelope',
            OrderedDict([('@ rel="nofollow noreferrer">here.
from bs4 import BeautifulSoup
xml = BeautifulSoup(xml_string, 'xml')
xml.find('soap:Body') # to get the soup:Body tag. 
xml.find('X') # for X tag


Answered By: Samir Sadek

Answer 3

Just replace all the 'soap:' and other namespace prefixes such as 'a:' with '' (just remove them an make it a non-SOAP xml file)

new_response = response.text.replace('soap:', '').replace('a:', '')

Then you can just proceed normally.

Answered By: ZekeC

Answer 4

try like this

import requests
from bs4 import BeautifulSoup
    
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
    
xml = BeautifulSoup(response.text, 'xml')
xml.find('soap:Body')  # to get the soup:Body tag.
xml.find('X')  # for X tag

Answered By: Santiago Trujillo Terán

How to parse SOAP XML with Python?

Question:

Answers: