creating a .csv from a combination of strings and panas dfs
Question:
my .csv
with multiple blocks need to follow this format (1 block sample):
so trying to do it in pandas and then write to csv. The problem is those comments above each of the two sections(outside the dataframes). Here is sample code:
import numpy as np
import pandas as pd
h_comment = pd.DataFrame(['#(H) Header'], columns=['name'])
df1 = pd.DataFrame({'name': 'Donald Trump',
'state':'FL',
'value':'0'},
index=[0])
data_comment = pd.DataFrame(['#(S) Schedule'], columns=['A'])
df2 = pd.DataFrame(np.random.rand(3,4),
columns=list('ABCD'))
to_csv1 = pd.concat([h_comment,df1])
to_csv2 = pd.concat([data_comment,df2])
the issue is that those "comments" are inside my df columns, for example:
to_csv2
Out[116]:
A B C D
0 #(S) Schedule NaN NaN NaN
0 0.521739 0.622079 0.322372 0.687531
1 0.991336 0.297848 0.635697 0.025620
2 0.068900 0.898806 0.562971 0.567817
the solution to first create a .csv
with comments and append dfs
to it is not great since there are many blocks like the above impacting the performance, so i’d rather write to csv at the end.
Answers:
The image you shared looks more like an Excel spreadsheet rather than a csv file.
To make a csv that matches the shape you described, one option is to use open
with to_csv
:
N = 2 # number of empty lines between both dfs
with open("output.csv", mode="w", newline="") as file:
file.write("#(H) Headern")
df1.to_csv(file, index=False)
file.write("n"*N)
file.write('#(S) Schedulen')
df2.to_csv(file, index=False)
Output (.csv in Excel) :
If needed, you can make with ExcelWriter
a spreadsheet that can hande sheet/cell formatting :
with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
worksheet = writer.book.add_worksheet()
header_format = writer.book.add_format({"border": None})
title_format = writer.book.add_format({"bold": True,
"italic": True,
"font_size": 11})
worksheet.write(0, 0, "#(H) Header", title_format)
df1.to_excel(writer, index=False, startrow=1)
worksheet.write(len(df1)+2, 0, "")
worksheet.write(len(df1)+3, 0, "")
worksheet.write(len(df1)+4, 0, "#(S) Schedule", title_format)
df2.to_excel(writer, index=False, startrow=len(df1)+5)
for col_num, value in enumerate(df2.columns):
worksheet.write(len(df1)+5, col_num, value, header_format)
for col_num, value in enumerate(df1.columns):
worksheet.write(1, col_num, value, header_format)
worksheet.autofit()
Output (.xlsx in Excel) :
my .csv
with multiple blocks need to follow this format (1 block sample):
so trying to do it in pandas and then write to csv. The problem is those comments above each of the two sections(outside the dataframes). Here is sample code:
import numpy as np
import pandas as pd
h_comment = pd.DataFrame(['#(H) Header'], columns=['name'])
df1 = pd.DataFrame({'name': 'Donald Trump',
'state':'FL',
'value':'0'},
index=[0])
data_comment = pd.DataFrame(['#(S) Schedule'], columns=['A'])
df2 = pd.DataFrame(np.random.rand(3,4),
columns=list('ABCD'))
to_csv1 = pd.concat([h_comment,df1])
to_csv2 = pd.concat([data_comment,df2])
the issue is that those "comments" are inside my df columns, for example:
to_csv2
Out[116]:
A B C D
0 #(S) Schedule NaN NaN NaN
0 0.521739 0.622079 0.322372 0.687531
1 0.991336 0.297848 0.635697 0.025620
2 0.068900 0.898806 0.562971 0.567817
the solution to first create a .csv
with comments and append dfs
to it is not great since there are many blocks like the above impacting the performance, so i’d rather write to csv at the end.
The image you shared looks more like an Excel spreadsheet rather than a csv file.
To make a csv that matches the shape you described, one option is to use open
with to_csv
:
N = 2 # number of empty lines between both dfs
with open("output.csv", mode="w", newline="") as file:
file.write("#(H) Headern")
df1.to_csv(file, index=False)
file.write("n"*N)
file.write('#(S) Schedulen')
df2.to_csv(file, index=False)
Output (.csv in Excel) :
If needed, you can make with ExcelWriter
a spreadsheet that can hande sheet/cell formatting :
with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
worksheet = writer.book.add_worksheet()
header_format = writer.book.add_format({"border": None})
title_format = writer.book.add_format({"bold": True,
"italic": True,
"font_size": 11})
worksheet.write(0, 0, "#(H) Header", title_format)
df1.to_excel(writer, index=False, startrow=1)
worksheet.write(len(df1)+2, 0, "")
worksheet.write(len(df1)+3, 0, "")
worksheet.write(len(df1)+4, 0, "#(S) Schedule", title_format)
df2.to_excel(writer, index=False, startrow=len(df1)+5)
for col_num, value in enumerate(df2.columns):
worksheet.write(len(df1)+5, col_num, value, header_format)
for col_num, value in enumerate(df1.columns):
worksheet.write(1, col_num, value, header_format)
worksheet.autofit()
Output (.xlsx in Excel) :