Python XML/Pandas: How to merge nested XML?

Question:

How can I join two different pieces of information together from this XML file?

# data
xml1 = ('''<?xml version="1.0" encoding="utf-8"?>
<TopologyDefinition xmlns_xsd="http://www.w3.org/2001/XMLSchema" xmlns_xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RSkus>
    <RSku ID="V1" Deprecated="true" Owner="Unknown" Generation="1">
      <Devices>
        <Device ID="1" SkuID="Switch" Role="xD" />
      </Devices>
      <Blades>
        <Blade ID="{1-20}" SkuID="SBlade" />
      </Blades>
      <Interfaces>
        <Interface ID="COM" HardwareID="NS1" SlotID="COM1" Type="serial" />
        <Interface ID="LINK" HardwareID="TS1" SlotID="UPLINK_1" Type="serial" />
      </Interfaces>
      <Wires>
        <WireGroup Type="network">
          <Wire LocationA="NS1" SlotA="{1-20}" LocationB="{1-20}" SlotB="NIC1" />
        </WireGroup>
        <WireGroup Type="serial">
          <Wire LocationA="TS1" SlotA="{7001-7020}" LocationB="{1-20}" SlotB="COM1" />
        </WireGroup>
      </Wires>
    </RSku>
  </RSkus>
</TopologyDefinition>
''')

While this is a single case and trivial in the instance below; if I run the below commands on the full file, I get shapes that do not match and therefore cannot be joined so easily.

How can I extract the XML information such that for every row, I get all the RSku information PLUS its Blade information. Each xpath contains no information that would let me join it to another xpath so that I may combine the information.

# how to have them joined?
pd.read_xml(xml1, xpath = ".//RSku")
pd.read_xml(xml1, xpath = ".//Blade")

# expected
pd.concat([pd.read_xml(xml1, xpath = ".//RSku"), pd.read_xml(xml1, xpath = ".//Blade")], axis=1)
Asked By: John Stud

||

Answers:

Consider transforming the XML with XSLT by flattening the document with information you need. Specifically, retrieve only Blade attributes using descendant::* axis and corresponding RSku attributes using the ancestor::* axis. Python’ lxml (default parser of pandas.read_xml) can run XSLT 1.0 scripts.

Below XSLT’s <xsl:for-each> is used to prefix RSku_ and Blade_ to attribute names since they share same attribute such as ID. Otherwise template would be much less wordy.

import pandas as pd

xml1 = ...

xsl = ('''<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet rel="nofollow noreferrer">Online XSLT Demo

Answered By: Parfait
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.