Need to exchange two sets of tabular data in a TXT file using Python
Question:
I have a set of coordinate data (saved in a txt file), like this:
Number N coordinates E coordinates Height Code
1 5111945.980 444258.900 131.300 C
2 5265566.655 443665.554 110.311 BR
...
It goes on for several dozen rows like that.
What I need done is to switch places of N coordinates with the ones on E coordinates (the header title also, of course) while everything else remains the same. For example, this would be the desired outcome in an output txt file:
Number E coordinates N coordinates Height Code
1 444258.900 5111945.980 131.300 C
2 443665.554 5265566.655 110.311 BR
For now, I only have the bit for printing out the table data as it shows in txt:
# open the source file
myfile = open("sometext.txt", "r")
# print out the data, to check if output working
for line in myfile:
print(line)
The data is tab spaced, as far as I can see.
I would like to solve this without external libraries, only default Python and its standard library.
Answers:
Ok, so one way to accomplish it would be using the pandas
library and the StringIO
library (Notice that pandas
is a library that you need to install with pip install pandas
),
As you stated in the comments your txt file is a tab delimiter which means t
is what separates your columns, here is one way to accomplish that:
Import libraries and load your file to a variable:
import pandas as pd
from io import StringIO
txt_path = dir_path + "/sample_txt_file.txt"
string = ""
with open(txt_path, 'r') as file:
string = file.read()
With your dummy data the variable string
will look like that:
'NumbertN coordinatestE coordinatestHeighttCoden1 t5111945.980t444258.900t131.300tCn2 t5265566.655t443665.554t110.311tBR'
Read it as a dataframe:
df = pd.read_csv(StringIO(string), sep='t')
Reorder the column based on the columns you specify:
df = df[['Number', 'E coordinates', 'N coordinates', 'Height', 'Code']]
Note, you can reorder as many of them as you want
And save your data to a new txt file:
df.to_csv(new_txt_path, index=None, sep='t')
Resulting data will be like that:
Number E coordinates N coordinates Height Code
1 444258.900 5111945.980 131.300 C
2 443665.554 5265566.655 110.311 BR
Solution with no third-party libraries
rows_to_be_switched = ['Number', 'Code']
def flip(row_to_flip, ind1, ind2):
tmp = row_to_flip[ind1]
row_to_flip[ind1] = row_to_flip[ind2]
row_to_flip[ind2] = tmp
return row_to_flip
with open('file1.txt', 'r') as file1, open('file2.txt', 'w') as file2:
data_rows = [row.rstrip('n') for row in file1.readlines()]
titles_indices = [data_rows[0].split('t').index(title) for title in rows_to_be_switched]
for row in data_rows:
file2.write('t'.join(flip(row.split('t'), *titles_indices) + ['n']))
Explanation:
- The
rows_to_be_switched
is a list of two titles that we flip between.
- The
flip
functions flips the indices of the items in a list and returns the new list
titles_indices = [data_rows[0].split('t').index(title) for title in rows_to_be_switched]
Detects the indices of the titles, and knows where to flip each two items in a row.
- We then write each row by joining the items with a tab, as it was the delimiter.
- Edit thanks to @tripleee – we need to strip the line breaks as it is a part of the string, thus
.index()
wouldn’t find last columns. Then we add the n
back to the end of the line.
Input:
file1.txt
Number N coordinates E coordinates Height Code
1 5111945.980 444258.900 131.300 C
2 5265566.655 443665.554 110.311 BR
Output:
file2.txt
Number E coordinates N coordinates Height Code
1 444258.900 5111945.980 131.300 C
2 443665.554 5265566.655 110.311 BR
You can also utilize the power of the f-string formatting capability and do the following:
l = 0
for line in myfile:
sl = [x for x in line.split(' ') if x]
if l == 0:
print(line) # prints first line without a swap
else:
print(f'{sl[0]:8}{sl[2]:16}{sl[1]:14}{sl[3]:10}{sl[4]:4}')
l += 1
I have a set of coordinate data (saved in a txt file), like this:
Number N coordinates E coordinates Height Code
1 5111945.980 444258.900 131.300 C
2 5265566.655 443665.554 110.311 BR
...
It goes on for several dozen rows like that.
What I need done is to switch places of N coordinates with the ones on E coordinates (the header title also, of course) while everything else remains the same. For example, this would be the desired outcome in an output txt file:
Number E coordinates N coordinates Height Code
1 444258.900 5111945.980 131.300 C
2 443665.554 5265566.655 110.311 BR
For now, I only have the bit for printing out the table data as it shows in txt:
# open the source file
myfile = open("sometext.txt", "r")
# print out the data, to check if output working
for line in myfile:
print(line)
The data is tab spaced, as far as I can see.
I would like to solve this without external libraries, only default Python and its standard library.
Ok, so one way to accomplish it would be using the pandas
library and the StringIO
library (Notice that pandas
is a library that you need to install with pip install pandas
),
As you stated in the comments your txt file is a tab delimiter which means t
is what separates your columns, here is one way to accomplish that:
Import libraries and load your file to a variable:
import pandas as pd
from io import StringIO
txt_path = dir_path + "/sample_txt_file.txt"
string = ""
with open(txt_path, 'r') as file:
string = file.read()
With your dummy data the variable string
will look like that:
'NumbertN coordinatestE coordinatestHeighttCoden1 t5111945.980t444258.900t131.300tCn2 t5265566.655t443665.554t110.311tBR'
Read it as a dataframe:
df = pd.read_csv(StringIO(string), sep='t')
Reorder the column based on the columns you specify:
df = df[['Number', 'E coordinates', 'N coordinates', 'Height', 'Code']]
Note, you can reorder as many of them as you want
And save your data to a new txt file:
df.to_csv(new_txt_path, index=None, sep='t')
Resulting data will be like that:
Number E coordinates N coordinates Height Code
1 444258.900 5111945.980 131.300 C
2 443665.554 5265566.655 110.311 BR
Solution with no third-party libraries
rows_to_be_switched = ['Number', 'Code']
def flip(row_to_flip, ind1, ind2):
tmp = row_to_flip[ind1]
row_to_flip[ind1] = row_to_flip[ind2]
row_to_flip[ind2] = tmp
return row_to_flip
with open('file1.txt', 'r') as file1, open('file2.txt', 'w') as file2:
data_rows = [row.rstrip('n') for row in file1.readlines()]
titles_indices = [data_rows[0].split('t').index(title) for title in rows_to_be_switched]
for row in data_rows:
file2.write('t'.join(flip(row.split('t'), *titles_indices) + ['n']))
Explanation:
- The
rows_to_be_switched
is a list of two titles that we flip between. - The
flip
functions flips the indices of the items in a list and returns the new list titles_indices = [data_rows[0].split('t').index(title) for title in rows_to_be_switched]
Detects the indices of the titles, and knows where to flip each two items in a row.- We then write each row by joining the items with a tab, as it was the delimiter.
- Edit thanks to @tripleee – we need to strip the line breaks as it is a part of the string, thus
.index()
wouldn’t find last columns. Then we add then
back to the end of the line.
Input:
file1.txt
Number N coordinates E coordinates Height Code
1 5111945.980 444258.900 131.300 C
2 5265566.655 443665.554 110.311 BR
Output:
file2.txt
Number E coordinates N coordinates Height Code
1 444258.900 5111945.980 131.300 C
2 443665.554 5265566.655 110.311 BR
You can also utilize the power of the f-string formatting capability and do the following:
l = 0
for line in myfile:
sl = [x for x in line.split(' ') if x]
if l == 0:
print(line) # prints first line without a swap
else:
print(f'{sl[0]:8}{sl[2]:16}{sl[1]:14}{sl[3]:10}{sl[4]:4}')
l += 1