How to group by the value of the dataframe?

Question:

I have these 2 df which are basically the same but in df1, the values are the amount of the payment of the respective customer and the another is the customers status for that period(the columns 1,2,3,4 are periods):

df1:

customer|1|2|3|4
x       |2|5|5|5
y       | |5|5|5
z       |5|5|5|

df2:

customer|1|2|3|4
x       |N|E|E|E
y       | |N|E|E
z       |N|E|C|-

I want to group by the status which is the values of the df2 to be like:

Status  1 |2 |3 |4
N        7|5 |  |
E         |10|10|10
C         |  |5 |

I used to group the status count using

df2.apply(pd.value_counts).fillna(0)

but now, instead of count the values, I want to SUM the value of the respective dataframe DF1

Asked By: Ricardo Fernandes

||

Answers:

As so often, this seems difficult, because you have your DataFrames in a weird shape. If you first melt them, it becomes easy: just merge them, groupby your quantities of interest and sum them (and pivot again if you want to display it in that format):

df1m = df1.melt(id_vars='customer', var_name='period', value_name='amount')
df2m = df2.melt(id_vars='customer', var_name='period', value_name='status')
dfm = df1m.merge(df2m)
res = dfm.groupby(['status', 'period'])['amount'].sum().reset_index()
res.pivot_table(index='status', columns='period')

#period      1     2     3     4
#status                         
#C         NaN   NaN   5.0   NaN
#E         NaN  10.0  10.0  10.0
#N         7.0   5.0   NaN   NaN

To show what melt does: it unpivots the DataFrame, so you have one row per observation (customer, period) that has the amount/status

df1m
#    customer period  amount
#0   x             1     2.0
#1   y             1     NaN
#2   z             1     5.0
#3   x             2     5.0
#4   y             2     5.0
#5   z             2     5.0
#6   x             3     5.0
#7   y             3     5.0
#8   z             3     5.0
#9   x             4     5.0
#10  y             4     5.0
11  z             4     NaN
Answered By: Jondiedoop