Extracting contents from an XML file with BeautifulSoup on Python

Question:

I have an XML structure like this:

<Trainers>
 <Trainer name="VisitorID" value=" NPRoiuKL213kiolkm2231"/>
 <Trainer name="VisitorNumber" value="BR-76594823-009922"/>
 <Trainer name="ServerIndex" value="213122"/>
 <Trainer name="VisitorPolicyID" value="ETR1234123"/>
</Trainers>

I want to extract the values based on the Trainer names. So basically something like this:

NPRoiuKL213kiolkm2231 from the VisitorID , BR-76594823-009922 from VisitorNumber and so on..

I also want to see if I can extract <Trainers> if that is possible.

I can do this on Pandas with ‘read_xml‘ and get a table, but I want to get these values individually so I can validate the table created from Pandas.

Here is what I tried:

soup = BeautifulSoup(Trainee.xml, 'xml')
soup.find_all({"Trainer name": "VisitorID"})
soup.find_all({"Trainer name": "VisitorNumber"})
soup.find_all({"Trainer name": "ServerIndex"})
soup.find_all({"Trainer name": "VisitorPolicyID"})

I expected this to work but these are giving me empty arrays []
Is there something I am missing here? When I parse it through Pandas with read_xml I get a proper table but individually I am getting an empty array.

Any help would be appreciated!

Thank you so much!

Asked By: Hamza Ahmed

||

Answers:

If I understand you correctly, you want to get all <Trainers> and from this all name/value pairs:

from bs4 import BeautifulSoup

xml_doc = """
<Trainers>
 <Trainer name="VisitorID" value=" NPRoiuKL213kiolkm2231"/>
 <Trainer name="VisitorNumber" value="BR-76594823-009922"/>
 <Trainer name="ServerIndex" value="213122"/>
 <Trainer name="VisitorPolicyID" value="ETR1234123"/>
</Trainers>"""

soup = BeautifulSoup(xml_doc, "xml")

for item in soup.select("Trainers"):
    for trainer in item.select("Trainer"):
        print(trainer["name"], trainer["value"])

Prints:

VisitorID  NPRoiuKL213kiolkm2231
VisitorNumber BR-76594823-009922
ServerIndex 213122
VisitorPolicyID ETR1234123

If you want to construct dataframe from the data, you can use this example:

df = pd.DataFrame(
    [
        {t["name"]: t["value"] for t in item.select("Trainer")}
        for item in soup.select("Trainers")
    ]
)
print(df)

Prints:

                VisitorID       VisitorNumber ServerIndex VisitorPolicyID
0   NPRoiuKL213kiolkm2231  BR-76594823-009922      213122      ETR1234123
Answered By: Andrej Kesely