Python – Sum and count specific values with pandas pivot_table

Question:

I have a pandas dataframe like

ACCOUNT AMOUNT STATUS 
1         -2      1
2         2       0
2         -1      0
1         2       1 
1         2       1

This is would like to get converted into an dataframe like

ACCOUNT  STATUS COUNT>0 COUNT<0 AMOUNT>0 AMOUNT<0 
1          1      2        1        4         2
2          0      1        1        2         1

So basically split if AMOUNT is > or < than 0 and then count and sum the result. I currently have the following, but can’t get the split AMOUNT right.

Data = pd.pivot_table(trans, values =['Status', 'AMOUNT'], index = ['ACCOUNT'], aggfunc = {'Status':np.mean, 'AMOUNT': [np.sum, 'count'] } )
Asked By: Slangers

||

Answers:

Using np.sign
This function returns an array of -1/0/1 depending on the signs of the values. Essentially giving me a convenient way of identifying things less, equal, or greater than zero. I use this in the group by statement and use agg to count the number of values, and sum to produce the total. After grouping by 3 vectors, I’ll end up with a 3-layer multi index. I unstack in order to take the last layer and pivot it to be included with the columns. This last layer is the sign layer.

df.groupby(
    ['ACCOUNT', 'STATUS', np.sign(df.AMOUNT)]
).AMOUNT.agg(['count', 'sum']).unstack()

               count    sum   
AMOUNT            -1  1  -1  1
ACCOUNT STATUS                
1       1          1  2  -2  4
2       0          1  1  -1  2

Extra effort to mimic OP’s expected output:
Here, I do the same things. But I add several steps that rename columns, combine layers, and take absolute values.

df.groupby(
    ['ACCOUNT', 'STATUS', np.sign(df.AMOUNT).map({-1: '<0', 0: '=0', 1: '>0'})]
).AMOUNT.agg(['count', 'sum']).rename(
    columns=dict(count='COUNT', sum='AMOUNT')
).unstack().abs().pipe(
    lambda d: d.set_axis(d.columns.map('{0[0]}{0[1]}'.format), 1, inplace=False)
)

                COUNT<0  COUNT>0  AMOUNT<0  AMOUNT>0
ACCOUNT STATUS                                      
1       1             1        2         2         4
2       0             1        1         1         2
Answered By: piRSquared

You can do this better with groupby and unstack. I have also created a few extra columns to make things clearer.

data = pd.DataFrame(
    [[1, -2, 1],
     [2, 2, 0],
     [2, -1, 0],
     [1,  2, 1],
     [1,  2, 1] 
    ],
    columns = ['ACCOUNT', 'AMOUNT', 'STATUS']
)

data['AMOUNT_POSITIVE'] = data['AMOUNT'] > 0
data['AMOUNT_ABSOLUTE'] = data['AMOUNT'].abs()

result = (data
          .groupby(["ACCOUNT", "STATUS", "AMOUNT_POSITIVE"])['AMOUNT_ABSOLUTE']
          .agg(['count', 'sum'])
          .unstack("AMOUNT_POSITIVE")
         )

print(result)

And you get your table:

                count         sum      
AMOUNT_POSITIVE False True  False True 
ACCOUNT STATUS                         
1       1           1     2     2     4
2       0           1     1     1     2
Answered By: Ken Syme

This is try to fix your pivot_table

pd.pivot_table(df.assign(new=df.AMOUNT.gt(0)), values =['AMOUNT'], index = ['ACCOUNT','STATUS'],columns='new',aggfunc = { 'AMOUNT': [np.sum, 'count'] } ).abs()
Out[431]: 
               AMOUNT                  
                count         sum      
new             False True  False True 
ACCOUNT STATUS                         
1       1           1     2     2     4
2       0           1     1     1     2
Answered By: BENY
The next example aggregates by taking the mean across multiple columns.

table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
                    aggfunc={'D': np.mean,
                             'E': np.mean})
table
                D         E
A   C
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333
Answered By: Ilias N
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.