Python – Sum and count specific values with pandas pivot_table
Question:
I have a pandas dataframe like
ACCOUNT AMOUNT STATUS
1 -2 1
2 2 0
2 -1 0
1 2 1
1 2 1
This is would like to get converted into an dataframe like
ACCOUNT STATUS COUNT>0 COUNT<0 AMOUNT>0 AMOUNT<0
1 1 2 1 4 2
2 0 1 1 2 1
So basically split if AMOUNT is > or < than 0 and then count and sum the result. I currently have the following, but can’t get the split AMOUNT right.
Data = pd.pivot_table(trans, values =['Status', 'AMOUNT'], index = ['ACCOUNT'], aggfunc = {'Status':np.mean, 'AMOUNT': [np.sum, 'count'] } )
Answers:
Using np.sign
This function returns an array of -1
/0
/1
depending on the signs of the values. Essentially giving me a convenient way of identifying things less, equal, or greater than zero. I use this in the group by statement and use agg
to count the number of values, and sum to produce the total. After grouping by 3 vectors, I’ll end up with a 3-layer multi index. I unstack in order to take the last layer and pivot it to be included with the columns. This last layer is the sign
layer.
df.groupby(
['ACCOUNT', 'STATUS', np.sign(df.AMOUNT)]
).AMOUNT.agg(['count', 'sum']).unstack()
count sum
AMOUNT -1 1 -1 1
ACCOUNT STATUS
1 1 1 2 -2 4
2 0 1 1 -1 2
Extra effort to mimic OP’s expected output:
Here, I do the same things. But I add several steps that rename columns, combine layers, and take absolute values.
df.groupby(
['ACCOUNT', 'STATUS', np.sign(df.AMOUNT).map({-1: '<0', 0: '=0', 1: '>0'})]
).AMOUNT.agg(['count', 'sum']).rename(
columns=dict(count='COUNT', sum='AMOUNT')
).unstack().abs().pipe(
lambda d: d.set_axis(d.columns.map('{0[0]}{0[1]}'.format), 1, inplace=False)
)
COUNT<0 COUNT>0 AMOUNT<0 AMOUNT>0
ACCOUNT STATUS
1 1 1 2 2 4
2 0 1 1 1 2
You can do this better with groupby
and unstack
. I have also created a few extra columns to make things clearer.
data = pd.DataFrame(
[[1, -2, 1],
[2, 2, 0],
[2, -1, 0],
[1, 2, 1],
[1, 2, 1]
],
columns = ['ACCOUNT', 'AMOUNT', 'STATUS']
)
data['AMOUNT_POSITIVE'] = data['AMOUNT'] > 0
data['AMOUNT_ABSOLUTE'] = data['AMOUNT'].abs()
result = (data
.groupby(["ACCOUNT", "STATUS", "AMOUNT_POSITIVE"])['AMOUNT_ABSOLUTE']
.agg(['count', 'sum'])
.unstack("AMOUNT_POSITIVE")
)
print(result)
And you get your table:
count sum
AMOUNT_POSITIVE False True False True
ACCOUNT STATUS
1 1 1 2 2 4
2 0 1 1 1 2
This is try to fix your pivot_table
pd.pivot_table(df.assign(new=df.AMOUNT.gt(0)), values =['AMOUNT'], index = ['ACCOUNT','STATUS'],columns='new',aggfunc = { 'AMOUNT': [np.sum, 'count'] } ).abs()
Out[431]:
AMOUNT
count sum
new False True False True
ACCOUNT STATUS
1 1 1 2 2 4
2 0 1 1 1 2
The next example aggregates by taking the mean across multiple columns.
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
aggfunc={'D': np.mean,
'E': np.mean})
table
D E
A C
bar large 5.500000 7.500000
small 5.500000 8.500000
foo large 2.000000 4.500000
small 2.333333 4.333333
I have a pandas dataframe like
ACCOUNT AMOUNT STATUS
1 -2 1
2 2 0
2 -1 0
1 2 1
1 2 1
This is would like to get converted into an dataframe like
ACCOUNT STATUS COUNT>0 COUNT<0 AMOUNT>0 AMOUNT<0
1 1 2 1 4 2
2 0 1 1 2 1
So basically split if AMOUNT is > or < than 0 and then count and sum the result. I currently have the following, but can’t get the split AMOUNT right.
Data = pd.pivot_table(trans, values =['Status', 'AMOUNT'], index = ['ACCOUNT'], aggfunc = {'Status':np.mean, 'AMOUNT': [np.sum, 'count'] } )
Using np.sign
This function returns an array of -1
/0
/1
depending on the signs of the values. Essentially giving me a convenient way of identifying things less, equal, or greater than zero. I use this in the group by statement and use agg
to count the number of values, and sum to produce the total. After grouping by 3 vectors, I’ll end up with a 3-layer multi index. I unstack in order to take the last layer and pivot it to be included with the columns. This last layer is the sign
layer.
df.groupby(
['ACCOUNT', 'STATUS', np.sign(df.AMOUNT)]
).AMOUNT.agg(['count', 'sum']).unstack()
count sum
AMOUNT -1 1 -1 1
ACCOUNT STATUS
1 1 1 2 -2 4
2 0 1 1 -1 2
Extra effort to mimic OP’s expected output:
Here, I do the same things. But I add several steps that rename columns, combine layers, and take absolute values.
df.groupby(
['ACCOUNT', 'STATUS', np.sign(df.AMOUNT).map({-1: '<0', 0: '=0', 1: '>0'})]
).AMOUNT.agg(['count', 'sum']).rename(
columns=dict(count='COUNT', sum='AMOUNT')
).unstack().abs().pipe(
lambda d: d.set_axis(d.columns.map('{0[0]}{0[1]}'.format), 1, inplace=False)
)
COUNT<0 COUNT>0 AMOUNT<0 AMOUNT>0
ACCOUNT STATUS
1 1 1 2 2 4
2 0 1 1 1 2
You can do this better with groupby
and unstack
. I have also created a few extra columns to make things clearer.
data = pd.DataFrame(
[[1, -2, 1],
[2, 2, 0],
[2, -1, 0],
[1, 2, 1],
[1, 2, 1]
],
columns = ['ACCOUNT', 'AMOUNT', 'STATUS']
)
data['AMOUNT_POSITIVE'] = data['AMOUNT'] > 0
data['AMOUNT_ABSOLUTE'] = data['AMOUNT'].abs()
result = (data
.groupby(["ACCOUNT", "STATUS", "AMOUNT_POSITIVE"])['AMOUNT_ABSOLUTE']
.agg(['count', 'sum'])
.unstack("AMOUNT_POSITIVE")
)
print(result)
And you get your table:
count sum
AMOUNT_POSITIVE False True False True
ACCOUNT STATUS
1 1 1 2 2 4
2 0 1 1 1 2
This is try to fix your pivot_table
pd.pivot_table(df.assign(new=df.AMOUNT.gt(0)), values =['AMOUNT'], index = ['ACCOUNT','STATUS'],columns='new',aggfunc = { 'AMOUNT': [np.sum, 'count'] } ).abs()
Out[431]:
AMOUNT
count sum
new False True False True
ACCOUNT STATUS
1 1 1 2 2 4
2 0 1 1 1 2
The next example aggregates by taking the mean across multiple columns.
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
aggfunc={'D': np.mean,
'E': np.mean})
table
D E
A C
bar large 5.500000 7.500000
small 5.500000 8.500000
foo large 2.000000 4.500000
small 2.333333 4.333333