Read a file, modify it, write it out, and then read it again – better options?

Question:

I have a fixed width text file that I am trying to read in using pandas.read_fwf. As noted here, this method removes leading and trailing whitespace. In order to get around that, I’d like to replace every whitespace character with some filler character, read the file in as a Dataframe, do my manipulation and editing, restore each filler character to a whitespace, and write the file back out as a text file.

First I manually replaced the whitespace with the tilde character (~) and then manually removed it at the end just using notepad and find/replace, but this is slow and definitely something Python should be able to do for me.

My current method is convoluted, but it does work. I essentially read the file in, make the whitespace replacements, write it out to a temp file, then read it back in to pandas as a fixed width file. Same thing in the opposite direction at the end of my program.

Reading stage (replacing whitespace with ~):

with open("input.txt") as inFile:
     txt1 = inFile.read().replace(" ", "~")
        
with open("input_temp.txt", 'w') as outFile:
    outFile.write(txt1)

with open("input_temp.txt") as inFile:
    df = pandas.read_fwf(inFile, widths=[8, 8, 8])

Writing stage (replacing ~ with whitespace):

with open("output_temp.txt", 'w') as outFile:
    np.savetxt(outFile, df.values, fmt='%s', delimiter='')
        
with open("output_temp.txt") as inFile:
    txt2 = inFile.read().replace("~", " ")

with open("output.txt", 'w') as outFile:
    outFile.write(txt2)

Efficiency/memory isn’t a huge concern, but I would still like a better way of doing this.

Asked By: kierabeth

||

Answers:

You can use io.StringIO as a file-like object to read from

import io

with open("input.txt") as inFile:
     txt1 = io.StringIO(inFile.read().replace(" ", "~"))
        
df = pandas.read_fwf(txt1, widths=[8, 8, 8])

and to write to

out_text = io.StringIO()
np.savetxt(out_text, df.values, fmt='%s', delimiter='')
    
txt2 = out_text.getvalue().replace("~", " ")

with open("output.txt", 'w') as outFile:
    outFile.write(txt2)
Answered By: chepner
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.