pandas convert columns to percentages of the totals
Question:
I have a dataframe with 4 columns an ID and three categories that results fell into
<80% 80-90 >90
id
1 2 4 4
2 3 6 1
3 7 0 3
I would like to convert it to percentages ie:
<80% 80-90 >90
id
1 20% 40% 40%
2 30% 60% 10%
3 70% 0% 30%
this seems like it should be within pandas capabilities but I just can’t figure it out.
Thanks in advance!
Answers:
You can do this using basic pandas operators .div
and .sum
, using the axis
argument to make sure the calculations happen the way you want:
cols = ['<80%', '80-90', '>90']
df[cols] = df[cols].div(df[cols].sum(axis=1), axis=0).multiply(100)
- Calculate the sum of each column (
df[cols].sum(axis=1
). axis=1
makes the summation occur across the rows, rather than down the columns.
- Divide the dataframe by the resulting series (
df[cols].div(df[cols].sum(axis=1), axis=0
). axis=0
makes the division happen across the columns.
- To finish, multiply the results by
100
so they are percentages between 0 and 100 instead of proportions between 0 and 1 (or you can skip this step and store them as proportions).
df/df.sum()
If you want to divide the sum of rows, transpose it first.
Tim Tian’s answer pretty much worked for me, but maybe this helps if you have a df with several columns and want to do a % column wise.
df_pct = df/df[df.columns].sum()*100
I was having trouble because I wanted to have the result of a pd.pivot_table expressed as a %, but couldn’t get it to work. So I just used that code on the resulting table itself and it worked.
You could use the .apply()
method:
df = df.apply(lambda x: x/sum(x)*100, axis=1)
I have a dataframe with 4 columns an ID and three categories that results fell into
<80% 80-90 >90
id
1 2 4 4
2 3 6 1
3 7 0 3
I would like to convert it to percentages ie:
<80% 80-90 >90
id
1 20% 40% 40%
2 30% 60% 10%
3 70% 0% 30%
this seems like it should be within pandas capabilities but I just can’t figure it out.
Thanks in advance!
You can do this using basic pandas operators .div
and .sum
, using the axis
argument to make sure the calculations happen the way you want:
cols = ['<80%', '80-90', '>90']
df[cols] = df[cols].div(df[cols].sum(axis=1), axis=0).multiply(100)
- Calculate the sum of each column (
df[cols].sum(axis=1
).axis=1
makes the summation occur across the rows, rather than down the columns. - Divide the dataframe by the resulting series (
df[cols].div(df[cols].sum(axis=1), axis=0
).axis=0
makes the division happen across the columns. - To finish, multiply the results by
100
so they are percentages between 0 and 100 instead of proportions between 0 and 1 (or you can skip this step and store them as proportions).
df/df.sum()
If you want to divide the sum of rows, transpose it first.
Tim Tian’s answer pretty much worked for me, but maybe this helps if you have a df with several columns and want to do a % column wise.
df_pct = df/df[df.columns].sum()*100
I was having trouble because I wanted to have the result of a pd.pivot_table expressed as a %, but couldn’t get it to work. So I just used that code on the resulting table itself and it worked.
You could use the .apply()
method:
df = df.apply(lambda x: x/sum(x)*100, axis=1)