Python pandas – writing groupby output to file
Question:
I used the following to get proportion information on my data:
>>>testfile = pd.read_csv('CCCC_output_all_FINAL.txt', delimiter="t", header=0)
>>> testdf = pd.DataFrame({'Proportion': testfile.groupby(('Name','Chr','Position','State')).size() / 39})
>>> testdf.head(5)
Proportion
Name Chr Position State
S-3AAAA 16 27557749 4 0.025641
5 0.076923
6 0.025641
S-3AAAC 15 35061490 2 0.076923
4 0.025641
>>> testdf.to_csv('CCCC_output_summary.txt', sep='t', header=True, index=False)
The output file only has the column Proportion
. I’d like the following table output:
Name Chr Position State Proportion
S-3AAAA 16 27557749 4 0.025641
S-3AAAA 16 27557749 5 0.076923
S-3AAAA 16 27557749 6 0.025641
S-3AAAC 15 35061490 2 0.076923
S-3AAAC 15 35061490 4 0.025641
Is it possible/easy to write the pandas output to a file like this?
Answers:
Use reset_index()
:
testdf.reset_index().to_csv('CCCC_output_summary.txt', sep='t', header=True, index=False)
I had the same problem. reset_index() as explained above did not work for me. I used an answer from another Stackoverflow and it worked wonderfully. Details are below.
Input csv has data under following two columns:
Item Code, Quantity
Output needed:
Average quantity grouped by item and both columns to be part of csv.
Initial code:
import pandas as pd
data_directory = os.path.join("D:\data")
df = pd.read_csv(os.path.join(data_directory, "input_file.csv"))
df_avg = df.groupby("Item Code")["Quantity"].mean()
df_avg.reset_index().to_csv(os.path.join(data_directory,'output_file.csv'), sep='t', header=True, index=False )
Output received:
Only the average quantity was written to output file
Following code solved the problem:
import pandas as pd
data_directory = os.path.join("D:\data")
df = pd.read_csv(os.path.join(data_directory, "input_file.csv"))
df.groupby("Item Code")["Quantity"].mean().reset_index()[["Item Code", "Quantity"]].to_csv(os.path.join(data_directory,'output_file.csv'))
By the above code, I got the output file which has two columns: Item Code and Quantity and the second column contains average of quantity for each Item code.
Other stack overflow reference: Pandas groupby to to_csv
Recently, I had to work with an Excel file that has 2 columns, with headers ‘Dog Breed’ and ‘Dog Name’. I came up with the following code (tested with Python 3.11.0
) that uses groupby()
and prints the grouped data into a .csv
file.
from pathlib import Path
import pandas as pd
p = Path(__file__).with_name('data.xlsx')
q = Path(__file__).with_name('data-grouped.csv')
df = pd.read_excel(p)
groups = df.groupby('Dog Breed', sort=False)
with q.open('w') as foutput:
for g in groups: # For each group
foutput.write(f"{g[0]}, {len(g[1])}") # Record the number of dogs in each group
for e, (index, row) in enumerate(g[1].iterrows()): # Iterating over the group's dataframe
name = str(row['Dog Name'])
if(e == 0):
mystr = f",{name}n"
else:
mystr = f",,{name}n"
foutput.write(mystr)
data.xlsx:
data-grouped.csv:
I used the following to get proportion information on my data:
>>>testfile = pd.read_csv('CCCC_output_all_FINAL.txt', delimiter="t", header=0)
>>> testdf = pd.DataFrame({'Proportion': testfile.groupby(('Name','Chr','Position','State')).size() / 39})
>>> testdf.head(5)
Proportion
Name Chr Position State
S-3AAAA 16 27557749 4 0.025641
5 0.076923
6 0.025641
S-3AAAC 15 35061490 2 0.076923
4 0.025641
>>> testdf.to_csv('CCCC_output_summary.txt', sep='t', header=True, index=False)
The output file only has the column Proportion
. I’d like the following table output:
Name Chr Position State Proportion
S-3AAAA 16 27557749 4 0.025641
S-3AAAA 16 27557749 5 0.076923
S-3AAAA 16 27557749 6 0.025641
S-3AAAC 15 35061490 2 0.076923
S-3AAAC 15 35061490 4 0.025641
Is it possible/easy to write the pandas output to a file like this?
Use reset_index()
:
testdf.reset_index().to_csv('CCCC_output_summary.txt', sep='t', header=True, index=False)
I had the same problem. reset_index() as explained above did not work for me. I used an answer from another Stackoverflow and it worked wonderfully. Details are below.
Input csv has data under following two columns:
Item Code, Quantity
Output needed:
Average quantity grouped by item and both columns to be part of csv.
Initial code:
import pandas as pd
data_directory = os.path.join("D:\data")
df = pd.read_csv(os.path.join(data_directory, "input_file.csv"))
df_avg = df.groupby("Item Code")["Quantity"].mean()
df_avg.reset_index().to_csv(os.path.join(data_directory,'output_file.csv'), sep='t', header=True, index=False )
Output received:
Only the average quantity was written to output file
Following code solved the problem:
import pandas as pd
data_directory = os.path.join("D:\data")
df = pd.read_csv(os.path.join(data_directory, "input_file.csv"))
df.groupby("Item Code")["Quantity"].mean().reset_index()[["Item Code", "Quantity"]].to_csv(os.path.join(data_directory,'output_file.csv'))
By the above code, I got the output file which has two columns: Item Code and Quantity and the second column contains average of quantity for each Item code.
Other stack overflow reference: Pandas groupby to to_csv
Recently, I had to work with an Excel file that has 2 columns, with headers ‘Dog Breed’ and ‘Dog Name’. I came up with the following code (tested with Python 3.11.0
) that uses groupby()
and prints the grouped data into a .csv
file.
from pathlib import Path
import pandas as pd
p = Path(__file__).with_name('data.xlsx')
q = Path(__file__).with_name('data-grouped.csv')
df = pd.read_excel(p)
groups = df.groupby('Dog Breed', sort=False)
with q.open('w') as foutput:
for g in groups: # For each group
foutput.write(f"{g[0]}, {len(g[1])}") # Record the number of dogs in each group
for e, (index, row) in enumerate(g[1].iterrows()): # Iterating over the group's dataframe
name = str(row['Dog Name'])
if(e == 0):
mystr = f",{name}n"
else:
mystr = f",,{name}n"
foutput.write(mystr)
data.xlsx:
data-grouped.csv: