Extracting contents from an XML file with BeautifulSoup on Python
Question:
I have an XML structure like this:
<Trainers>
<Trainer name="VisitorID" value=" NPRoiuKL213kiolkm2231"/>
<Trainer name="VisitorNumber" value="BR-76594823-009922"/>
<Trainer name="ServerIndex" value="213122"/>
<Trainer name="VisitorPolicyID" value="ETR1234123"/>
</Trainers>
I want to extract the values based on the Trainer names. So basically something like this:
NPRoiuKL213kiolkm2231
from the VisitorID
, BR-76594823-009922
from VisitorNumber
and so on..
I also want to see if I can extract <Trainers>
if that is possible.
I can do this on Pandas with ‘read_xml
‘ and get a table, but I want to get these values individually so I can validate the table created from Pandas.
Here is what I tried:
soup = BeautifulSoup(Trainee.xml, 'xml')
soup.find_all({"Trainer name": "VisitorID"})
soup.find_all({"Trainer name": "VisitorNumber"})
soup.find_all({"Trainer name": "ServerIndex"})
soup.find_all({"Trainer name": "VisitorPolicyID"})
I expected this to work but these are giving me empty arrays []
Is there something I am missing here? When I parse it through Pandas with read_xml
I get a proper table but individually I am getting an empty array.
Any help would be appreciated!
Thank you so much!
Answers:
If I understand you correctly, you want to get all <Trainers>
and from this all name
/value
pairs:
from bs4 import BeautifulSoup
xml_doc = """
<Trainers>
<Trainer name="VisitorID" value=" NPRoiuKL213kiolkm2231"/>
<Trainer name="VisitorNumber" value="BR-76594823-009922"/>
<Trainer name="ServerIndex" value="213122"/>
<Trainer name="VisitorPolicyID" value="ETR1234123"/>
</Trainers>"""
soup = BeautifulSoup(xml_doc, "xml")
for item in soup.select("Trainers"):
for trainer in item.select("Trainer"):
print(trainer["name"], trainer["value"])
Prints:
VisitorID NPRoiuKL213kiolkm2231
VisitorNumber BR-76594823-009922
ServerIndex 213122
VisitorPolicyID ETR1234123
If you want to construct dataframe from the data, you can use this example:
df = pd.DataFrame(
[
{t["name"]: t["value"] for t in item.select("Trainer")}
for item in soup.select("Trainers")
]
)
print(df)
Prints:
VisitorID VisitorNumber ServerIndex VisitorPolicyID
0 NPRoiuKL213kiolkm2231 BR-76594823-009922 213122 ETR1234123
I have an XML structure like this:
<Trainers>
<Trainer name="VisitorID" value=" NPRoiuKL213kiolkm2231"/>
<Trainer name="VisitorNumber" value="BR-76594823-009922"/>
<Trainer name="ServerIndex" value="213122"/>
<Trainer name="VisitorPolicyID" value="ETR1234123"/>
</Trainers>
I want to extract the values based on the Trainer names. So basically something like this:
NPRoiuKL213kiolkm2231
from the VisitorID
, BR-76594823-009922
from VisitorNumber
and so on..
I also want to see if I can extract <Trainers>
if that is possible.
I can do this on Pandas with ‘read_xml
‘ and get a table, but I want to get these values individually so I can validate the table created from Pandas.
Here is what I tried:
soup = BeautifulSoup(Trainee.xml, 'xml')
soup.find_all({"Trainer name": "VisitorID"})
soup.find_all({"Trainer name": "VisitorNumber"})
soup.find_all({"Trainer name": "ServerIndex"})
soup.find_all({"Trainer name": "VisitorPolicyID"})
I expected this to work but these are giving me empty arrays []
Is there something I am missing here? When I parse it through Pandas with read_xml
I get a proper table but individually I am getting an empty array.
Any help would be appreciated!
Thank you so much!
If I understand you correctly, you want to get all <Trainers>
and from this all name
/value
pairs:
from bs4 import BeautifulSoup
xml_doc = """
<Trainers>
<Trainer name="VisitorID" value=" NPRoiuKL213kiolkm2231"/>
<Trainer name="VisitorNumber" value="BR-76594823-009922"/>
<Trainer name="ServerIndex" value="213122"/>
<Trainer name="VisitorPolicyID" value="ETR1234123"/>
</Trainers>"""
soup = BeautifulSoup(xml_doc, "xml")
for item in soup.select("Trainers"):
for trainer in item.select("Trainer"):
print(trainer["name"], trainer["value"])
Prints:
VisitorID NPRoiuKL213kiolkm2231
VisitorNumber BR-76594823-009922
ServerIndex 213122
VisitorPolicyID ETR1234123
If you want to construct dataframe from the data, you can use this example:
df = pd.DataFrame(
[
{t["name"]: t["value"] for t in item.select("Trainer")}
for item in soup.select("Trainers")
]
)
print(df)
Prints:
VisitorID VisitorNumber ServerIndex VisitorPolicyID
0 NPRoiuKL213kiolkm2231 BR-76594823-009922 213122 ETR1234123