How to convert an XML file to an Excel file?
Question:
I have a directory which contains multiple XML files, lets say it contains the following 2:
<Record>
<RecordID>Madird01</RecordID>
<Location>Madird</Location>
<Date>07-09-2020</Date>
<Time>07u43m55s</Time>
<Version>2.0.1</Version>
<Version_2>v1.9</Version_2>
<Max_30e>
<I_25Hz_1s>56.40</I_25Hz_1s>
<I_25Hz_2s>7.44</I_25Hz_2s>
</Max_30e>
<Max_30e>
<I_75Hz_1s>1.56</I_75Hz_1s>
<I_75Hz_2s>0.36</I_75Hz_2s>
</Max_30e>
</Record>
And:
<Record>
<RecordID>London01</RecordID>
<Location>London</Location>
<Date>07-09-2020</Date>
<Time>08u53m45s</Time>
<Version>2.0.1</Version>
<Version_2>v1.9</Version_2>
<Max_30e>
<I_25Hz_1s>56.40</I_25Hz_1s>
<I_25Hz_2s>7.44</I_25Hz_2s>
</Max_30e>
<Max_30e>
<I_75Hz_1s>1.56</I_75Hz_1s>
<I_75Hz_2s>0.36</I_75Hz_2s>
</Max_30e>
</Record>
Now I want to convert this to an excel file which shows every XML file in horizontal order like this:
I tried to convert the XML to csv string first and then to Excel but I got stuck, there should be easier ways.
This is my current code:
import xml.etree.ElementTree as ET
import os
xml_root = r'c:dataDesktopBlueXML-files'
for file in os.listdir(xml_root):
xml_file_path = os.path.join(xml_root, file)
tree = ET.parse(xml_file_path)
root = tree.getroot()
tree = ET.ElementTree(root)
for child in root:
mainlevel = child.tag
xmltocsv = ''
for elem in root.iter():
if elem.tag == root.tag:
continue
if elem.tag == mainlevel:
xmltocsv = xmltocsv + 'n'
xmltocsv = xmltocsv + str(elem.tag).rstrip() + str(elem.attrib).strip() + ';' + str(elem.text).rstrip() + ';'
Answers:
create a csv file which is Excel friendly format.
import xml.etree.ElementTree as ET
from os import listdir
xml_lst = [f for f in listdir() if f.startswith('xml')]
fields = ['RecordID','I_25Hz_1s','I_75Hz_2s'] # TODO - add rest of the fields
with open('out.csv','w') as f:
f.write(','.join(fields) + 'n')
for xml in xml_lst:
root = ET.parse(xml)
values = [root.find(f'.//{f}').text for f in fields]
f.write(','.join(values) + 'n')
output
RecordID,I_25Hz_1s,I_75Hz_2s
Madird01,56.40,0.36
London01,56.40,0.36
When you need to iterate over files in folder with similar names one of the ways could be make a pattern and use glob
. To make sure that returned path is file you can use isfile()
.
Regarding XML, I see that basically you need to write values of every terminal tag in column with name of this tag. As you have various files you can create tag-value dictionaries from each file and store them into ChainMap
. After all files processed you can use DictWriter
to write all data into final csv file.
This method is much more safe and flexible then use static column names. Firstly program will collect all possible tag(column) names from all files, so in case if XML doesn’t have such a tag or have some extra tags it won’t throw an exception and all data will be saved.
Code:
import xml.etree.ElementTree as ET
from glob import iglob
from os.path import isfile, join
from csv import DictWriter
from collections import ChainMap
xml_root = r"C:dataDesktopBlueXML-files"
pattern = "xmlfile_*"
data = ChainMap()
for filename in iglob(join(xml_root, pattern)):
if isfile(filename):
tree = ET.parse(filename)
root = tree.getroot()
temp = {node.tag: node.text for node in root.iter() if not node}
data = data.new_child(temp)
with open(join(xml_root, "data.csv"), "w", newline="") as f:
writer = DictWriter(f, data)
writer.writeheader()
writer.writerows(data.maps[:-1]) # last is empty dict
Upd. If you want to use xlsx
format instead of csv
you have to use third-party library (e.g. openpyxl
). Example of usage:
from openpyxl import Workbook
...
wb = Workbook(write_only=True)
ws = wb.create_sheet()
ws.append(list(data)) # write header
for row in data.maps[:-1]:
ws.append([row.get(key, "") for key in data])
wb.save(join(xml_root, "data.xlsx"))
I have a directory which contains multiple XML files, lets say it contains the following 2:
<Record>
<RecordID>Madird01</RecordID>
<Location>Madird</Location>
<Date>07-09-2020</Date>
<Time>07u43m55s</Time>
<Version>2.0.1</Version>
<Version_2>v1.9</Version_2>
<Max_30e>
<I_25Hz_1s>56.40</I_25Hz_1s>
<I_25Hz_2s>7.44</I_25Hz_2s>
</Max_30e>
<Max_30e>
<I_75Hz_1s>1.56</I_75Hz_1s>
<I_75Hz_2s>0.36</I_75Hz_2s>
</Max_30e>
</Record>
And:
<Record>
<RecordID>London01</RecordID>
<Location>London</Location>
<Date>07-09-2020</Date>
<Time>08u53m45s</Time>
<Version>2.0.1</Version>
<Version_2>v1.9</Version_2>
<Max_30e>
<I_25Hz_1s>56.40</I_25Hz_1s>
<I_25Hz_2s>7.44</I_25Hz_2s>
</Max_30e>
<Max_30e>
<I_75Hz_1s>1.56</I_75Hz_1s>
<I_75Hz_2s>0.36</I_75Hz_2s>
</Max_30e>
</Record>
Now I want to convert this to an excel file which shows every XML file in horizontal order like this:
I tried to convert the XML to csv string first and then to Excel but I got stuck, there should be easier ways.
This is my current code:
import xml.etree.ElementTree as ET
import os
xml_root = r'c:dataDesktopBlueXML-files'
for file in os.listdir(xml_root):
xml_file_path = os.path.join(xml_root, file)
tree = ET.parse(xml_file_path)
root = tree.getroot()
tree = ET.ElementTree(root)
for child in root:
mainlevel = child.tag
xmltocsv = ''
for elem in root.iter():
if elem.tag == root.tag:
continue
if elem.tag == mainlevel:
xmltocsv = xmltocsv + 'n'
xmltocsv = xmltocsv + str(elem.tag).rstrip() + str(elem.attrib).strip() + ';' + str(elem.text).rstrip() + ';'
create a csv file which is Excel friendly format.
import xml.etree.ElementTree as ET
from os import listdir
xml_lst = [f for f in listdir() if f.startswith('xml')]
fields = ['RecordID','I_25Hz_1s','I_75Hz_2s'] # TODO - add rest of the fields
with open('out.csv','w') as f:
f.write(','.join(fields) + 'n')
for xml in xml_lst:
root = ET.parse(xml)
values = [root.find(f'.//{f}').text for f in fields]
f.write(','.join(values) + 'n')
output
RecordID,I_25Hz_1s,I_75Hz_2s
Madird01,56.40,0.36
London01,56.40,0.36
When you need to iterate over files in folder with similar names one of the ways could be make a pattern and use glob
. To make sure that returned path is file you can use isfile()
.
Regarding XML, I see that basically you need to write values of every terminal tag in column with name of this tag. As you have various files you can create tag-value dictionaries from each file and store them into ChainMap
. After all files processed you can use DictWriter
to write all data into final csv file.
This method is much more safe and flexible then use static column names. Firstly program will collect all possible tag(column) names from all files, so in case if XML doesn’t have such a tag or have some extra tags it won’t throw an exception and all data will be saved.
Code:
import xml.etree.ElementTree as ET
from glob import iglob
from os.path import isfile, join
from csv import DictWriter
from collections import ChainMap
xml_root = r"C:dataDesktopBlueXML-files"
pattern = "xmlfile_*"
data = ChainMap()
for filename in iglob(join(xml_root, pattern)):
if isfile(filename):
tree = ET.parse(filename)
root = tree.getroot()
temp = {node.tag: node.text for node in root.iter() if not node}
data = data.new_child(temp)
with open(join(xml_root, "data.csv"), "w", newline="") as f:
writer = DictWriter(f, data)
writer.writeheader()
writer.writerows(data.maps[:-1]) # last is empty dict
Upd. If you want to use xlsx
format instead of csv
you have to use third-party library (e.g. openpyxl
). Example of usage:
from openpyxl import Workbook
...
wb = Workbook(write_only=True)
ws = wb.create_sheet()
ws.append(list(data)) # write header
for row in data.maps[:-1]:
ws.append([row.get(key, "") for key in data])
wb.save(join(xml_root, "data.xlsx"))