Transition Matrix from Pandas Dataframe That Has Two Periods in Python
Question:
I have a dataset that shows the customer’s payment information for their dept.
Each customer have two periods of data.
CUST_ID PERIOD Delinquency Value
100729 1 1
100729 2 3
100888 1 2
100888 2 1
137300 1 0
137300 2 1
I need to compute the transition ratios between delinquency values from period 1 to period 2, and create a table that stores this matrix
Expected output is:
0 1 2 3
0 0 1 0 0
1 0 0 0 1
2 0 1 0 0
3 0 0 0 0
Answers:
You can use a pivot
and crosstab
:
tmp = df.pivot(index='CUST_ID', columns='PERIOD', values='Delinquency Value')
M = df['Delinquency Value'].max()+1
out = (pd.crosstab(tmp[1], tmp[2])
.reindex(index=range(M), columns=range(M), fill_value=0)
)
print(out.rename_axis(index=None, columns=None))
Output:
0 1 2 3
0 0 1 0 0
1 0 0 0 1
2 0 1 0 0
3 0 0 0 0
I have a dataset that shows the customer’s payment information for their dept.
Each customer have two periods of data.
CUST_ID PERIOD Delinquency Value
100729 1 1
100729 2 3
100888 1 2
100888 2 1
137300 1 0
137300 2 1
I need to compute the transition ratios between delinquency values from period 1 to period 2, and create a table that stores this matrix
Expected output is:
0 1 2 3
0 0 1 0 0
1 0 0 0 1
2 0 1 0 0
3 0 0 0 0
You can use a pivot
and crosstab
:
tmp = df.pivot(index='CUST_ID', columns='PERIOD', values='Delinquency Value')
M = df['Delinquency Value'].max()+1
out = (pd.crosstab(tmp[1], tmp[2])
.reindex(index=range(M), columns=range(M), fill_value=0)
)
print(out.rename_axis(index=None, columns=None))
Output:
0 1 2 3
0 0 1 0 0
1 0 0 0 1
2 0 1 0 0
3 0 0 0 0