Pandas: How to calculate the percentage of one column against another?

Question:

I am just trying to calculate the percentage of one column against another’s total, but I am unsure how to do this in Pandas so the calculation gets added into a new column.

Let’s say, for argument’s sake, my data frame has two attributes:

  • Number of Green Marbles
  • Total Number of Marbles

Now, how would I calculate the percentage of the Number of Green Marbles out of the Total Number of Marbles in Pandas?

Obviously, I know that the calculation will be something like this:

  • (Number of Green Marbles / Total Number of Marbles) * 100

Thanks – any help is much appreciated!

Asked By: user13984013

||

Answers:

df[‘percentage columns’] = (df[‘Number of Green Marbles’]) / (df[‘Total Number of Marbles’] ) * 100

Answered By: Janneman

By default, arithmetic operations on pandas dataframes are element-wise, so this is as simple as it can be:

import pandas as pd

>>> d = pd.DataFrame()
>>> d['green'] = [3,5,10,12]
>>> d['total'] = [8,8,20,20]
>>> d
   green  total
0      3      8
1      5      8
2     10     20
3     12     20
>>> d['percent_green'] = d['green'] / d['total'] * 100
>>> d
   green  total  percent_green
0      3      8           37.5
1      5      8           62.5
2     10     20           50.0
3     12     20           60.0

References:

Answered By: Stef

Here is my comparison of regular vs vectorized approach:

%timeit us_consum['Commercial_%ofUS'] = us_consum['Commercial_MWhrs']*100/us_consum['Total US consumption (MWhr)']

351 µs ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%timeit us_consum['Commercial_%ofUS'] = (us_consum['Commercial_MWhrs'].div(us_consum['Total US consumption (MWhr)']))*100 
337 µs ± 60.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Answered By: Mainland
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.