How to extract specfic values from xml file using python xml.etree.ElementTree iterating until an id is found inside a hidden child node?

Question:

I need to iterate over the tag ObjectHeader and when the tag ObjectType/Id is equal to 1424 I need to extract all the values inside the following tags ObjectVariant/ObjectValue/Characteristic/Name and ObjectVariant/ObjectValue/PropertyValue/Value and put them in a dictionary. The expected output will be like this:
{"Var1": 10.4,
"Var2": 15.6}

Here is a snippet from the XML that I’m working with which has 30k lines (Hint: Id 1424 only appears once in the whole XML file).

<ObjectContext>
    <ObjectHeader>
        <ObjectType>
            <Id>1278</Id>
            <Name>ID_NAME</Name>
        </ObjectType>
        <ObjectVariant>
            <ObjectValue>
                <Characteristic>
                    <Name>Var1</Name>
                    <Description>Something about the name</Description>
                </Characteristic>
                <PropertyValue>
                    <Value>10.6</Value>
                    <Description>Something about the value</Description>
                </PropertyValue>
            </ObjectValue>
        </ObjectVariant>
    </ObjectHeader>
    <ObjectHeader>
        <ObjectType>
            <Id>1424</Id>
            <Name>ID_NAME</Name>
        </ObjectType>
        <ObjectVariant>
            <ObjectValue>
                <Characteristic>
                    <Name>Var1</Name>
                    <Description>Something about the name</Description>
                </Characteristic>
                <PropertyValue>
                    <Value>10.4</Value>
                    <Description>Something about the value</Description>
                </PropertyValue>
            </ObjectValue>
            <ObjectValue>
                <Characteristic>
                    <Name>Var2</Name>
                    <CharacteristicType>Something about the name</CharacteristicType>
                </Characteristic>
                <PropertyValue>
                    <Value>15.6</Value>
                    <Description>Something about the value</Description>
                </PropertyValue>
            </ObjectValue>
        </ObjectVariant>
    </ObjectHeader>
</ObjectContext> 

Asked By: Ranieri Bubans

||

Answers:

Here is one possibility to write all to pandas and then filter the interessting values:

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse("xml_to_dict.xml")
root = tree.getroot()

columns = ["id", "name", "value"]
row_list = []
for objHead in root.findall('.//ObjectHeader'):
    for elem in objHead.iter():
        if elem.tag == 'Id':
            id = elem.text
        if elem.tag == 'Name':
            name = elem.text
        if elem.tag == 'Value':
            value = elem.text
            row = id, name, value
            row_list.append(row)


df = pd.DataFrame(row_list, columns=columns)
dff = df.query('id == "1424"')

print("Dictionary:", dict(list(zip(dff['name'], dff['value']))))

Output:

Dictionary: {'Var1': '10.4', 'Var2': '15.6'}
Answered By: Hermann12

I am modifying the solution proposed by @Hermann12 by iterating over the tuples in order to create a pandas-free solution.

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse("xml_to_dict.xml")
root = tree.getroot()

columns = ["id", "name", "value"]
row_list = []
for objHead in root.findall('.//ObjectHeader'):
    for elem in objHead.iter():
        if elem.tag == 'Id':
            id = elem.text
        if elem.tag == 'Name':
            name = elem.text
        if elem.tag == 'Value':
            value = elem.text
            row = id, name, value
            row_list.append(row)

dictionary_values = {}
    for t in response_rows:
        if t[0]=='1424':
            dicitonary_values[t[1]] = t[2]
    return maestro_values

print("Dictionary:", dict(list(zip(dff['name'], dff['value']))))

Output

{'Var1': '10.4', 'Var2': '15.6'}
Answered By: Ranieri Bubans
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.