Is there a better way of displaying a csv file that was made from a Multi-index dataframe?
Question:
I was working on a project for calculating the different speeds of different algorithms with different datas and I wanted to save this data in a csv format and eventually view it in Excel.
I did this in the following lines of code:
df=DataFrame(dictionary_container,index=indexer)
df=df.transpose()
df.to_csv("sorted_results.csv")
print(df)
Now, the problem is that my data frame has a multi-index and it seems like when this multi-index is converted into csv it doesn’t has the same format and adaptation DataFrame has.
When printed as a dataframe in my pycharm console it looks something like this:
Selection Sort Bubble Sort Insertion Sort Shell Sort Merge Sort Quick Sort
Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time
Ascending_Sorted_250 31125.0 0.0 0.008000 61752.0 0.0 0.008000 249.0 0.0 0.000000 1506.0 0.0 0.000000 102985.0 0.0 0.000000 32220.0 16110.0 0.007996
Ascending_Sorted_500 124750.0 0.0 0.008002 248502.0 0.0 0.038870 499.0 0.0 0.002037 3506.0 0.0 0.000000 466830.0 0.0 0.008044 130682.0 65341.0 0.012074
Ascending_Sorted_1000 499500.0 0.0 0.048405 997002.0 0.0 0.124397 999.0 0.0 0.000000 8006.0 0.0 0.009063 2091964.0 0.0 0.031324 542432.0 271216.0 0.029222
Descending_Sorted_250 31125.0 144.0 0.002518 61752.0 0.0 0.007534 249.0 31005.0 0.000000 1506.0 867.0 0.008008 98029.0 0.0 0.000000 62250.0 31125.0 0.008001
Descending_Sorted_500 124750.0 304.0 0.016446 248502.0 0.0 0.058094 499.0 124499.0 0.021138 3506.0 1988.0 0.007011 450069.0 0.0 0.017184 249500.0 124750.0 0.008066
Descending_Sorted_1000 499500.0 627.0 0.057602 997002.0 0.0 0.211443 999.0 498961.0 0.121921 8006.0 4480.0 0.009598 2038910.0 0.0 0.036352 999000.0 499500.0 0.051927
Unordered_Sorted_250 31125.0 247.0 0.000000 61752.0 0.0 0.008011 249.0 16176.0 0.008005 1506.0 1190.0 0.000000 119646.0 0.0 0.008008 102818.0 51409.0 0.000000
Unordered_Sorted_500 124750.0 497.0 0.016108 248502.0 0.0 0.032092 499.0 62517.0 0.014741 3506.0 3038.0 0.002054 546612.0 0.0 0.010065 433998.0 216999.0 0.000000
Unordered_Sorted_1000 499500.0 991.0 0.050410 997002.0 0.0 0.148675 999.0 238976.0 0.062542 8006.0 6602.0 0.000000 2631656.0 0.0 0.032139 2001812.0 1000906.0 0.008013
Ascending_Sorted_2000 49995000.0 0.0 5.434769 99970002.0 0.0 13.766283 9999.0 0.0 0.022327 120005.0 0.0 0.057448 314528524.0 0.0 0.392050 65294480.0 32647240.0 0.897377
Descending_Sorted_2000 1999000.0 1249.0 0.216450 3994002.0 0.0 0.796005 1999.0 1997998.0 0.476391 18006.0 9947.0 0.016075 9127864.0 0.0 0.054320 3998000.0 1999000.0 0.181211
Unordered_Sorted_2000 1999000.0 1997.0 0.270270 3994002.0 0.0 0.633384 1999.0 975989.0 0.272254 18006.0 17558.0 0.007661 11645237.0 0.0 0.072448 8575300.0 4287650.0 0.007999
But when displayed in csv(sorted_results.csv) it looks like this which is very off putting:
But I want it to look something like this which is very formal and way better:
The first level of the multi index is repeated and I do not want that
I tried everything I tried replacing the second and third repetitions with pd.NA
or numpy.NaN
or even None
but even then it will actually display ,nan or None in the indexes, respectively, and I don’t want that I want it to be empty over there.
I did search for solutions on stack overflow, and I did end up stumbling upon something but the solutions/python scripts had to deal with using os to actually affect the already created csv file.
I also tried to use style.format but apparently this only helps with CSS styles and usually helps with color and stuff. Also I don’t know how much it would help with a .csv file.
Answers:
You can’t achieve this functionality with a csv
since it is a simple file format. You’d have to create an excel
file utilizing the to_excel()
in pandas. Then use an excel manipulating library like openpyxl
.
df=DataFrame(dictionary_container,index=indexer)
df=df.transpose()
df.to_excel("sorted_results.xlsx")
Then we can use openpyxl get the desired results:
from openpyxl import load_workbook
from openpyxl.styles import Font
from openpyxl.styles.alignment import Alignment
# Read the excel file and get worksheet
wb = load_workbook(filename="sorted_results.xlsx", data_only=True)
ws = wb.worksheets[0]
max_cols = ws.max_column
# Cell Formatting
font = Font(name='Arial', b=True)
center = Alignment(horizontal="center")
# Insert the sorting results header
ws.insert_rows(1)
ws.merge_cells(start_row=1, start_column=2, end_row=1, end_column=max_cols)
header = ws.cell(row=1, column=2)
header.value = "Sorting Results"
header.font = font
header.alignment = center
# Make all values in column a bold
for cell in ws['A']:
cell.font = font
for col in range(2, ws.max_column+1):
# Merge the second header cells
if (col - 2) % 3 == 0 :
ws.merge_cells(start_row=2, start_column=col, end_row=2, end_column=col+2)
cell = ws.cell(row=2, column=col)
cell.font = font
cell.alignment = center
# Bold cells in 3rd row
cell = ws.cell(row=3, column=col)
cell.font = font
wb.save('sorted_results.xlsx')
This gives me the following results:
Note: data
is just a placeholder I added for examples sake.
I was working on a project for calculating the different speeds of different algorithms with different datas and I wanted to save this data in a csv format and eventually view it in Excel.
I did this in the following lines of code:
df=DataFrame(dictionary_container,index=indexer)
df=df.transpose()
df.to_csv("sorted_results.csv")
print(df)
Now, the problem is that my data frame has a multi-index and it seems like when this multi-index is converted into csv it doesn’t has the same format and adaptation DataFrame has.
When printed as a dataframe in my pycharm console it looks something like this:
Selection Sort Bubble Sort Insertion Sort Shell Sort Merge Sort Quick Sort
Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time Data Comparisons Data Swaps Time
Ascending_Sorted_250 31125.0 0.0 0.008000 61752.0 0.0 0.008000 249.0 0.0 0.000000 1506.0 0.0 0.000000 102985.0 0.0 0.000000 32220.0 16110.0 0.007996
Ascending_Sorted_500 124750.0 0.0 0.008002 248502.0 0.0 0.038870 499.0 0.0 0.002037 3506.0 0.0 0.000000 466830.0 0.0 0.008044 130682.0 65341.0 0.012074
Ascending_Sorted_1000 499500.0 0.0 0.048405 997002.0 0.0 0.124397 999.0 0.0 0.000000 8006.0 0.0 0.009063 2091964.0 0.0 0.031324 542432.0 271216.0 0.029222
Descending_Sorted_250 31125.0 144.0 0.002518 61752.0 0.0 0.007534 249.0 31005.0 0.000000 1506.0 867.0 0.008008 98029.0 0.0 0.000000 62250.0 31125.0 0.008001
Descending_Sorted_500 124750.0 304.0 0.016446 248502.0 0.0 0.058094 499.0 124499.0 0.021138 3506.0 1988.0 0.007011 450069.0 0.0 0.017184 249500.0 124750.0 0.008066
Descending_Sorted_1000 499500.0 627.0 0.057602 997002.0 0.0 0.211443 999.0 498961.0 0.121921 8006.0 4480.0 0.009598 2038910.0 0.0 0.036352 999000.0 499500.0 0.051927
Unordered_Sorted_250 31125.0 247.0 0.000000 61752.0 0.0 0.008011 249.0 16176.0 0.008005 1506.0 1190.0 0.000000 119646.0 0.0 0.008008 102818.0 51409.0 0.000000
Unordered_Sorted_500 124750.0 497.0 0.016108 248502.0 0.0 0.032092 499.0 62517.0 0.014741 3506.0 3038.0 0.002054 546612.0 0.0 0.010065 433998.0 216999.0 0.000000
Unordered_Sorted_1000 499500.0 991.0 0.050410 997002.0 0.0 0.148675 999.0 238976.0 0.062542 8006.0 6602.0 0.000000 2631656.0 0.0 0.032139 2001812.0 1000906.0 0.008013
Ascending_Sorted_2000 49995000.0 0.0 5.434769 99970002.0 0.0 13.766283 9999.0 0.0 0.022327 120005.0 0.0 0.057448 314528524.0 0.0 0.392050 65294480.0 32647240.0 0.897377
Descending_Sorted_2000 1999000.0 1249.0 0.216450 3994002.0 0.0 0.796005 1999.0 1997998.0 0.476391 18006.0 9947.0 0.016075 9127864.0 0.0 0.054320 3998000.0 1999000.0 0.181211
Unordered_Sorted_2000 1999000.0 1997.0 0.270270 3994002.0 0.0 0.633384 1999.0 975989.0 0.272254 18006.0 17558.0 0.007661 11645237.0 0.0 0.072448 8575300.0 4287650.0 0.007999
But when displayed in csv(sorted_results.csv) it looks like this which is very off putting:
But I want it to look something like this which is very formal and way better:
The first level of the multi index is repeated and I do not want that
I tried everything I tried replacing the second and third repetitions with pd.NA
or numpy.NaN
or even None
but even then it will actually display ,nan or None in the indexes, respectively, and I don’t want that I want it to be empty over there.
I did search for solutions on stack overflow, and I did end up stumbling upon something but the solutions/python scripts had to deal with using os to actually affect the already created csv file.
I also tried to use style.format but apparently this only helps with CSS styles and usually helps with color and stuff. Also I don’t know how much it would help with a .csv file.
You can’t achieve this functionality with a csv
since it is a simple file format. You’d have to create an excel
file utilizing the to_excel()
in pandas. Then use an excel manipulating library like openpyxl
.
df=DataFrame(dictionary_container,index=indexer)
df=df.transpose()
df.to_excel("sorted_results.xlsx")
Then we can use openpyxl get the desired results:
from openpyxl import load_workbook
from openpyxl.styles import Font
from openpyxl.styles.alignment import Alignment
# Read the excel file and get worksheet
wb = load_workbook(filename="sorted_results.xlsx", data_only=True)
ws = wb.worksheets[0]
max_cols = ws.max_column
# Cell Formatting
font = Font(name='Arial', b=True)
center = Alignment(horizontal="center")
# Insert the sorting results header
ws.insert_rows(1)
ws.merge_cells(start_row=1, start_column=2, end_row=1, end_column=max_cols)
header = ws.cell(row=1, column=2)
header.value = "Sorting Results"
header.font = font
header.alignment = center
# Make all values in column a bold
for cell in ws['A']:
cell.font = font
for col in range(2, ws.max_column+1):
# Merge the second header cells
if (col - 2) % 3 == 0 :
ws.merge_cells(start_row=2, start_column=col, end_row=2, end_column=col+2)
cell = ws.cell(row=2, column=col)
cell.font = font
cell.alignment = center
# Bold cells in 3rd row
cell = ws.cell(row=3, column=col)
cell.font = font
wb.save('sorted_results.xlsx')
This gives me the following results:
Note: data
is just a placeholder I added for examples sake.