How do I write a python pandas.DataFrame to a file using aligned space chacarcters?
Question:
I want to store a pandas.DataFrame to a text file that has the columns aligned using whitespace characters. If this is my sample DataFrame:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
In [4]: df
Out[4]:
0 1 2
0 0.000 0.125 0.250
1 0.375 0.500 0.625
2 0.750 0.875 1.000
[3 rows x 3 columns]
I want to do something like this:
In [5]: df.to_csv('test.txt', sep='?')
to get this:
In [6]: more test.txt
0 1 2
0 0.0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1.0
What separator should I use? I want to know if there is a way to do this without using the t
character. It looks nice
0 1 2
0 0.0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1.0
but then my text files have tab characters which create other problems.
If I use sep=' '
I get this which is obviously wrong.
0 1 2
0 0.0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1.0
I know pandas can read in files like this so I figure there is a way to write out files like this.
Answers:
How about this
import numpy as np
import pandas as pd
import csv
df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
df.to_csv('test.txt',
float_format='%10.3f', sep=" ",
quoting=csv.QUOTE_NONE, escapechar=" ")
It produces:
0 1 2
0 0.000 0.125 0.250
1 0.375 0.500 0.625
2 0.750 0.875 1.000
Number of spaces can be ofc customized by the number of digits of the ‘longest’ number.
Try using hollerith. pypi github. It’s a (very) new python library that deals with fixed width formatting – and open to contribution. Unfortunately trying to use built-in python string formatting (or the C printf for that matter!) for fixed width doesn’t work well for some widths when there are large integers and certain double precision floats.
One way is to use the tabulate package.
% ipython
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
df
Out[1]:
0 1 2
0 0.000 0.125 0.250
1 0.375 0.500 0.625
2 0.750 0.875 1.000
In [2]:
from tabulate import tabulate
content = tabulate(
df.values.tolist(), list(df.columns), showindex=list(df.index), tablefmt="plain"
)
In [3]:
print(content)
0 1 2
0 0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1
If you do not want to print the row index, do not pass the showindex option.
In [4]:
content2 = tabulate(
df.values.tolist(), list(df.columns), tablefmt="plain"
)
print(content2)
0 1 2
0 0.125 0.25
0.375 0.5 0.625
0.75 0.875 1
I want to store a pandas.DataFrame to a text file that has the columns aligned using whitespace characters. If this is my sample DataFrame:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
In [4]: df
Out[4]:
0 1 2
0 0.000 0.125 0.250
1 0.375 0.500 0.625
2 0.750 0.875 1.000
[3 rows x 3 columns]
I want to do something like this:
In [5]: df.to_csv('test.txt', sep='?')
to get this:
In [6]: more test.txt
0 1 2
0 0.0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1.0
What separator should I use? I want to know if there is a way to do this without using the t
character. It looks nice
0 1 2
0 0.0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1.0
but then my text files have tab characters which create other problems.
If I use sep=' '
I get this which is obviously wrong.
0 1 2
0 0.0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1.0
I know pandas can read in files like this so I figure there is a way to write out files like this.
How about this
import numpy as np
import pandas as pd
import csv
df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
df.to_csv('test.txt',
float_format='%10.3f', sep=" ",
quoting=csv.QUOTE_NONE, escapechar=" ")
It produces:
0 1 2
0 0.000 0.125 0.250
1 0.375 0.500 0.625
2 0.750 0.875 1.000
Number of spaces can be ofc customized by the number of digits of the ‘longest’ number.
Try using hollerith. pypi github. It’s a (very) new python library that deals with fixed width formatting – and open to contribution. Unfortunately trying to use built-in python string formatting (or the C printf for that matter!) for fixed width doesn’t work well for some widths when there are large integers and certain double precision floats.
One way is to use the tabulate package.
% ipython
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
df
Out[1]:
0 1 2
0 0.000 0.125 0.250
1 0.375 0.500 0.625
2 0.750 0.875 1.000
In [2]:
from tabulate import tabulate
content = tabulate(
df.values.tolist(), list(df.columns), showindex=list(df.index), tablefmt="plain"
)
In [3]:
print(content)
0 1 2
0 0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1
If you do not want to print the row index, do not pass the showindex option.
In [4]:
content2 = tabulate(
df.values.tolist(), list(df.columns), tablefmt="plain"
)
print(content2)
0 1 2
0 0.125 0.25
0.375 0.5 0.625
0.75 0.875 1