How to make pivoted columns in pandas with month-by-month total sum

Question:

We have raw banking data for making credit scoring risk audit later.
In simplified form it’s looks like this:

import pandas as pd

d = {'contract_id':['1175082589', '1175082589', '1175082589','1175082589','1575082194','1575082194','1575082194','1575082194'],
    'date_of_contract_signature':['2019-05-01','2019-05-01','2019-05-01','2019-05-01','2019-06-01','2019-06-01','2019-06-01','2019-06-01'], 
'date_of_report': ['2019-05-01', '2019-06-01', '2019-07-01','2019-08-01','2019-06-01', '2019-07-01', '2019-08-01','2019-09-01'], 
'debt_90_plus':[0, 0, 34800,34800,0,0,56500,56500]}
df=pd.DataFrame(data=d)
print(df)

  contract_id date_of_contract_signature date_of_report  debt_90_plus
0  1175082589                 2019-05-01     2019-05-01             0
1  1175082589                 2019-05-01     2019-06-01             0
2  1175082589                 2019-05-01     2019-07-01         34800
3  1175082589                 2019-05-01     2019-08-01         34800
4  1575082194                 2019-06-01     2019-06-01             0
5  1575082194                 2019-06-01     2019-07-01             0
6  1575082194                 2019-06-01     2019-08-01         56500
7  1575082194                 2019-06-01     2019-09-01         56500


The result should be month-by-month change in debt_90_plus:

d1={'date_of_contract_signature': ['2019-05-01', '2019-06-01'], 1:[0, 0],2:[0,0],3:[34800,56500],4:[34800,56500]}
df1=pd.DataFrame(data=d1)
print(df1)

  date_of_contract_signature  1  2      3      4
0                 2019-05-01  0  0  34800  34800
1                 2019-06-01  0  0  56500  56500

I am trying:

print(pd.pivot_table(df,index=['date_of_contract_signature','date_of_report']))

Which returns very wrong result:

                                           debt_90_plus
date_of_contract_signature date_of_report              
2019-05-01                 2019-05-01                 0
                           2019-06-01                 0
                           2019-07-01             34800
                           2019-08-01             34800
2019-06-01                 2019-06-01                 0
                           2019-07-01                 0
                           2019-08-01             56500
                           2019-09-01             56500

What can help solving the issue?

Answers:

Example

I don’t think long column names are needed in Q&A. It’s just inconvenient in code or when creating output. so i make column name simple

d = {'id':['1175082589', '1175082589', '1175082589','1175082589','1575082194','1575082194','1575082194','1575082194'],
    'date_A':['2019-05-01','2019-05-01','2019-05-01','2019-05-01','2019-06-01','2019-06-01','2019-06-01','2019-06-01'], 
'date_B': ['2019-05-01', '2019-06-01', '2019-07-01','2019-08-01','2019-06-01', '2019-07-01', '2019-08-01','2019-09-01'], 
'debt':[0, 0, 34800,34800,0,0,56500,56500]}
df=pd.DataFrame(data=d)

df

    id          date_A      date_B      debt
0   1175082589  2019-05-01  2019-05-01  0
1   1175082589  2019-05-01  2019-06-01  0
2   1175082589  2019-05-01  2019-07-01  34800
3   1175082589  2019-05-01  2019-08-01  34800
4   1575082194  2019-06-01  2019-06-01  0
5   1575082194  2019-06-01  2019-07-01  0
6   1575082194  2019-06-01  2019-08-01  56500
7   1575082194  2019-06-01  2019-09-01  56500

Code

make order series in same date

s1 = df.groupby('date_A').cumcount().add(1)

s1

0    1
1    2
2    3
3    4
4    1
5    2
6    3
7    4
dtype: int64

make pivot_table with s1

out = df.pivot_table('debt', index='date_A', columns=s1).reset_index()

out

    date_A      1   2   3       4
0   2019-05-01  0   0   34800   34800
1   2019-06-01  0   0   56500   56500

Other way

If you are well with pandas, you can also use following 1-line code

out = (df.groupby('date_A')['debt'].apply(lambda x: pd.Series(list(x)))
       .unstack().rename(columns=lambda x: x + 1).reset_index())

same result

Answered By: Panda Kim
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.