Sum last n rows of df.count() results into one row

Question

I am looking for a way to generate nice summary statistics of a dataframe. Consider the following example:

>> df = pd.DataFrame({"category":['u','v','w','u','y','z','y','z','x','x','y','z','x','z','x']})

>> df['category'].value_counts()
z    4
x    4
y    3
u    2
v    1
w    1

>> ??
           count  pct
z              4  27%
x              4  27%
y              3  20%
Other (3)      4  27%

The result sums the value counts of the n=3 last rows up, deletes them and then adds them as one row to the original value counts. Also it would be nice to have everything as percents. Any ideas how to implement this? Cheers!

Asked By: Simon1994

||

Source

Answer 1

For DataFrame with percentages use Series.iloc with indexing, crate DataFrame by Series.to_frame, add new row and new column filled by percentages:

s = df['category'].value_counts()

n= 3
out = s.iloc[:-n].to_frame('count')
out.loc['Other ({n})'] = s.iloc[-n:].sum()
out['pct'] = out['count'].div(out['count'].sum()).apply(lambda x: f"{x:.0%}")
print (out)
           count  pct
z              4  27%
x              4  27%
y              3  20%
Other (3)      4  27%

Answered By: jezrael

Answer 2

I would use tail(-3) to get the last values except for the first 3:

counts = df['category'].value_counts()
others = counts.tail(-3)

counts[f'Others ({len(others)})'] = others.sum()
counts.drop(others.index, inplace=True)

counts.to_frame(name='count').assign(pct=lambda d: d['count'].div(d['count'].sum()).mul(100).round())

Output:

            count   pct
z               4  27.0
x               4  27.0
y               3  20.0
Others (3)      4  27.0

Answered By: mozway

Answer 3

This snippet

df = pd.DataFrame({"category":['u','v','w','u','y','z','y','z','x','x','y','z','x','z','x']})
cutoff_index = 3
categegory_counts = pd.DataFrame([df['category'].value_counts(),df['category'].value_counts(normalize=True)],index=["Count","Percent"]).T.reset_index()
other_rows = categegory_counts[cutoff_index:].set_index("index")
categegory_counts = categegory_counts[:cutoff_index].set_index("index")
summary_table = pd.concat([categegory_counts,pd.DataFrame(other_rows.sum(),columns=[f"Other ({len(other_rows)})"]).T])
summary_table = summary_table.astype({'Count':'int'})
summary_table['Percent'] = summary_table['Percent'].apply(lambda x: "{0:.2f}%".format(x*100))
print(summary_table)

will give you what you need. Also in a nice format;)

           Count Percent
z              4  26.67%
x              4  26.67%
y              3  20.00%
Other (3)      4  26.67%

Answered By: Viktor

Sum last n rows of df.count() results into one row

Question:

Answers: