Apply outer tag attribute to inner tag – Flattening xml in python

Question:

I have a xml file like:

<plays format="tokens">
    <period number="1">
      <play/>
      <play/>
      <play/>
    </period>
    <period number="2">
      <play/>
      <play/>
      <play/>
    </period>

Each play tag contains a bunch of variables, but I would also like to add the period number as a variable to the play tags. My goal is to produce a table with each play and their attributes as well as a column that says which period that played occurred in (1 or 2).

My current code to flatten the plays out is:

d = []
for play in root.iter('play'):
    d.append(play.attrib)
    
df = pd.DataFrame(d)

This gives me every play and their attributes in the table df, but the period is not currently included in this table. Any direction would help, thank you!

Asked By: bblackburn

||

Answers:

You can do it this way with ElementTree like below-

plays.xml

<plays format="tokens">
    <period number="1">
      <play attr1="abc" attr2="ddd"/>
      <play attr1="cbc" attr2="ddd"/>
      <play attr1="dbc" attr2="ddd"/>
    </period>
    <period number="2">
      <play attr1="abc" attr2="ddd"/>
      <play attr1="dbc" attr2="ddd"/>
      <play attr1="kbc" attr2="ddd" />
    </period>
</plays>

main.py

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse('plays.xml')
root = tree.getroot()

# find the period number for each play by searching for the parent period element
periods = []
for period in root.iter('period'):
    number = period.attrib['number']
    for play in period.iter('play'):
        other_attr = play.attrib
        # this line merges the other_attributes of play element(attr1, attr2) with the top attribute(number) of period element, see reference: https://stackoverflow.com/a/62820532/1138192 
        periods.append({**{"number": number}, **other_attr})

df = pd.DataFrame(periods)
print(df)

Output:

  number attr1 attr2
0      1   abc   ddd
1      1   cbc   ddd
2      1   dbc   ddd
3      2   abc   ddd
4      2   dbc   ddd
5      2   kbc   ddd
Answered By: Always Sunny
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.