How to convert a .txt to .xml in python

Question:

So the current problem I’m facing would be in converting a text file into a xml file.
The text file would be in this format.

Serial Number:      Operator ID:  test  Time:  00:03:47 Test Step 2      TP1:  17.25    TP2:  2.46
Serial Number:      Operator ID:  test  Time:  00:03:47 Test Step 2      TP1:  17.25    TP2:  2.46

I wanted to convert to convert it into a xml with this format:

<?xml version="1.0" encoding="utf-8"?>
<root>
 <filedata>
 </serialnumber>
 <operatorid>test</operatorid>
 <time>00:00:42 Test Step 2</time>
 <tp1>17.25</tp1>
 <tp2>2.46</tp2>
 </filedata>
...
</root>

I was using a code like this to convert my previous text file to xml…but right now I’m facing problems in splitting the lines.

import xml.etree.ElementTree as ET
import fileinput
import os
import itertools as it

root = ET.Element('root')
with open('text.txt') as f:
    lines = f.read().splitlines()
celldata = ET.SubElement(root, 'filedata')
for line in it.groupby(lines):
    line=line[0]
    if not line:
        celldata = ET.SubElement(root, 'filedata')
    else:
        tag = line.split(":")
        el=ET.SubElement(celldata,tag[0].replace(" ",""))
        tag=' '.join(tag[1:]).strip()
        if 'File Name' in line:
            tag = line.split("\")[-1].strip()
        elif 'File Size' in line:
            splist =  filter(None,line.split(" "))
            tag = splist[splist.index('Low:')+1]
            #splist[splist.index('High:')+1]
        el.text = tag
import xml.dom.minidom as minidom
formatedXML = minidom.parseString(
                          ET.tostring(
                                      root)).toprettyxml(indent=" ",encoding='utf-8').strip()

with open("test.xml","wb") as f:
    f.write(formatedXML)

I saw a similar question in stackoverflow
Python text file to xml
but the problem is I couldn’t change it into a .csv format as this file is generated by a certain machine.
If anyone know how to solve it, please do help.
Thank you.

Asked By: user12288933

||

Answers:

Here is a better method of splitting the lines.

Notice that the text variable would technically be your .txt file, and that I purposely modified it so that we have a greater context of the output.

from collections import OrderedDict
from pprint import pprint

# Text would be our loaded .txt file.
text = """Serial Number:  test    Operator ID:  test1  Time:  00:03:47 Test Step 1      TP1:  17.25    TP2:  2.46
Serial Number:      Operator ID:  test2  Time:  00:03:48 Test Step 2      TP1:  17.24    TP2:  2.47"""

# Headers of the intended break-points in the text files.
headers = ["Serial Number:", "Operator ID:", "Time:", "TP1:", "TP2:"]

information = []

# Split our text by lines.
for line in text.split("n"):

    # Split our text up so we only have the information per header.
    default_header = headers[0]
    for header in headers[1:]:
        line = line.replace(header, default_header)
    info = [i.strip() for i in line.split(default_header)][1:]

    # Compile our header+information together into OrderedDict's.
    compiled_information = OrderedDict()
    for header, info in zip(headers, info):
        compiled_information[header] = info

    # Append to our overall information list.
    information.append(compiled_information)

# Pretty print the information (not needed, only for better display of data.)
pprint(information)

Outputs:

[OrderedDict([('Serial Number:', 'test'),
              ('Operator ID:', 'test1'),
              ('Time:', '00:03:47 Test Step 1'),
              ('TP1:', '17.25'),
              ('TP2:', '2.46')]),
 OrderedDict([('Serial Number:', ''),
              ('Operator ID:', 'test2'),
              ('Time:', '00:03:48 Test Step 2'),
              ('TP1:', '17.24'),
              ('TP2:', '2.47')])]

This method should generalize better than what you are currently writing, and the idea of the code is something I’ve had saved from another project. I recommend you going through the code and understanding its logic.

From here you should be able to loop through the information list and create your custom .xml file. I would recommend you checking out dicttoxml as well, as it might make your life much easier on the final step.

In regards to your code, remember: breaking down fundamental tasks is easier than trying to incorporate them all into one. By trying to create the xml file while you split your txt file you’ve created a monster that is hard to tackle when it revolts back with bugs. Instead, take it one step at a time — create “checkpoints” that you are 100% certain work, and then move on to the next task.

Answered By: Felipe

I tried the same but just getting the output as OrderedDict() written 7 times. I have attached screenshot of the code

Answered By: Vishwanath B