Pandas: How to calculate the percentage of one column against another?
Question:
I am just trying to calculate the percentage of one column against another’s total, but I am unsure how to do this in Pandas so the calculation gets added into a new column.
Let’s say, for argument’s sake, my data frame has two attributes:
- Number of Green Marbles
- Total Number of Marbles
Now, how would I calculate the percentage of the Number of Green Marbles out of the Total Number of Marbles in Pandas?
Obviously, I know that the calculation will be something like this:
- (Number of Green Marbles / Total Number of Marbles) * 100
Thanks – any help is much appreciated!
Answers:
df[‘percentage columns’] = (df[‘Number of Green Marbles’]) / (df[‘Total Number of Marbles’] ) * 100
By default, arithmetic operations on pandas dataframes are element-wise, so this is as simple as it can be:
import pandas as pd
>>> d = pd.DataFrame()
>>> d['green'] = [3,5,10,12]
>>> d['total'] = [8,8,20,20]
>>> d
green total
0 3 8
1 5 8
2 10 20
3 12 20
>>> d['percent_green'] = d['green'] / d['total'] * 100
>>> d
green total percent_green
0 3 8 37.5
1 5 8 62.5
2 10 20 50.0
3 12 20 60.0
References:
Here is my comparison of regular vs vectorized approach:
%timeit us_consum['Commercial_%ofUS'] = us_consum['Commercial_MWhrs']*100/us_consum['Total US consumption (MWhr)']
351 µs ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit us_consum['Commercial_%ofUS'] = (us_consum['Commercial_MWhrs'].div(us_consum['Total US consumption (MWhr)']))*100
337 µs ± 60.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I am just trying to calculate the percentage of one column against another’s total, but I am unsure how to do this in Pandas so the calculation gets added into a new column.
Let’s say, for argument’s sake, my data frame has two attributes:
- Number of Green Marbles
- Total Number of Marbles
Now, how would I calculate the percentage of the Number of Green Marbles out of the Total Number of Marbles in Pandas?
Obviously, I know that the calculation will be something like this:
- (Number of Green Marbles / Total Number of Marbles) * 100
Thanks – any help is much appreciated!
df[‘percentage columns’] = (df[‘Number of Green Marbles’]) / (df[‘Total Number of Marbles’] ) * 100
By default, arithmetic operations on pandas dataframes are element-wise, so this is as simple as it can be:
import pandas as pd
>>> d = pd.DataFrame()
>>> d['green'] = [3,5,10,12]
>>> d['total'] = [8,8,20,20]
>>> d
green total
0 3 8
1 5 8
2 10 20
3 12 20
>>> d['percent_green'] = d['green'] / d['total'] * 100
>>> d
green total percent_green
0 3 8 37.5
1 5 8 62.5
2 10 20 50.0
3 12 20 60.0
References:
Here is my comparison of regular vs vectorized approach:
%timeit us_consum['Commercial_%ofUS'] = us_consum['Commercial_MWhrs']*100/us_consum['Total US consumption (MWhr)']
351 µs ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit us_consum['Commercial_%ofUS'] = (us_consum['Commercial_MWhrs'].div(us_consum['Total US consumption (MWhr)']))*100
337 µs ± 60.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)