Sum last n rows of df.count() results into one row

Question:

I am looking for a way to generate nice summary statistics of a dataframe. Consider the following example:

>> df = pd.DataFrame({"category":['u','v','w','u','y','z','y','z','x','x','y','z','x','z','x']})

>> df['category'].value_counts()
z    4
x    4
y    3
u    2
v    1
w    1

>> ??
           count  pct
z              4  27%
x              4  27%
y              3  20%
Other (3)      4  27%

The result sums the value counts of the n=3 last rows up, deletes them and then adds them as one row to the original value counts. Also it would be nice to have everything as percents. Any ideas how to implement this? Cheers!

Asked By: Simon1994

||

Answers:

For DataFrame with percentages use Series.iloc with indexing, crate DataFrame by Series.to_frame, add new row and new column filled by percentages:

s = df['category'].value_counts()

n= 3
out = s.iloc[:-n].to_frame('count')
out.loc['Other ({n})'] = s.iloc[-n:].sum()
out['pct'] = out['count'].div(out['count'].sum()).apply(lambda x: f"{x:.0%}")
print (out)
           count  pct
z              4  27%
x              4  27%
y              3  20%
Other (3)      4  27%
Answered By: jezrael

I would use tail(-3) to get the last values except for the first 3:

counts = df['category'].value_counts()
others = counts.tail(-3)

counts[f'Others ({len(others)})'] = others.sum()
counts.drop(others.index, inplace=True)

counts.to_frame(name='count').assign(pct=lambda d: d['count'].div(d['count'].sum()).mul(100).round())

Output:

            count   pct
z               4  27.0
x               4  27.0
y               3  20.0
Others (3)      4  27.0
Answered By: mozway

This snippet

df = pd.DataFrame({"category":['u','v','w','u','y','z','y','z','x','x','y','z','x','z','x']})
cutoff_index = 3
categegory_counts = pd.DataFrame([df['category'].value_counts(),df['category'].value_counts(normalize=True)],index=["Count","Percent"]).T.reset_index()
other_rows = categegory_counts[cutoff_index:].set_index("index")
categegory_counts = categegory_counts[:cutoff_index].set_index("index")
summary_table = pd.concat([categegory_counts,pd.DataFrame(other_rows.sum(),columns=[f"Other ({len(other_rows)})"]).T])
summary_table = summary_table.astype({'Count':'int'})
summary_table['Percent'] = summary_table['Percent'].apply(lambda x: "{0:.2f}%".format(x*100))
print(summary_table)

will give you what you need. Also in a nice format;)

           Count Percent
z              4  26.67%
x              4  26.67%
y              3  20.00%
Other (3)      4  26.67%

Answered By: Viktor
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.