XML ElementTree Python: Find all the relations of a node

Question:

If we supose the following XML file:

<XML Data>
    <Record>
        <Service>
            <Product id="A"></Product>
            <Product id="B"></Product>
            <Product id="C"></Product>
        </Service>
    </Record>
    <Record>
        <Service>
            <Product id="A"></Product>
            <Product id="B"></Product>
            <Product id="Y"></Product>
        </Service>
    </Record>
    <Record>
        <Service>
            <Product id="U"></Product>
        </Service>
    </Record>
</XML Data>

As you can see, each record shows a single client but without an unique identificator. Each service has multiple products.

I want to get all products that have been sold with product A. Therefore, I am trying to get a list like this:

ServiceID
B
C
Y

I’ve been using:

import xml.etree.ElementTree as ET
Asked By: Jon Ander Díez

||

Answers:

You can select elements based on an attribute via [@attrib='value'] according to the official documentation. When testing this i exchanged your tag <XML Data> and </XML Data> with <Data> and </Data>. Example code:

from xml.etree import ElementTree as ET

data = ET.parse(r"/path/to/your/input.xml")
root = data.getroot()
for product in root.findall("./Record/Service/Product[@id='A']"):
    print(product.attrib["id"])
    print(product.text)

Edit

After reading your question again i noticed that you first want to check whether a product with id A exists within a Service, and only then store the IDs (uniquely & sorted), so i adapted the code:

from xml.etree import ElementTree as ET

data = ET.parse(r"/path/to/your/input.xml")
root = data.getroot()
product_ids = set()
for service in root.findall("./Record/Service"):
    list_contains_a = False

    # iterate once to identify if list contains product with ID = 'A'
    for product in service.findall("./Product"):
        if product.attrib["id"] == "A":
            list_contains_a = True

    # if list contains product with ID = 'A', iterate second time and fetch IDs
    if list_contains_a:
        for product in service.findall("./Product"):
            if product.attrib["id"] == "A":
                continue

            # add to set to prevent duplicates
            product_ids.add(product.attrib["id"])

ret_list = ["ServiceID"] + list(sorted(product_ids))
print(ret_list)

Answered By: mnikley
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.